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Abstract 

Some  key  features  of  a  mathematical  description  of  an  immune  response  are  an  estimate  of  the  number  of 
responding  cells  and  the  manner  in  which  those  cells  divide,  differentiate,  and  die.  The  intracellular  dye  CFSE 
is  a  powerful  experimental  tool  for  the  analysis  of  a  population  of  dividing  cells,  and  numerous  mathematical 
treatments  have  been  aimed  at  using  CFSE  data  to  describe  an  immune  response  [30,  31,  32,  37,  38,  41,  47, 
48].  Recently,  partial  differential  equation  structured  population  models,  with  measured  CFSE  fluorescence 
intensity  as  the  structure  variable,  have  been  shown  to  accurately  fit  histogram  data  obtained  from  CFSE  flow 
cytometry  experiments  [18,  19,  51,  53].  In  this  report,  the  population  of  cells  is  mathematically  organized 
into  compartments,  with  all  cells  in  a  single  compartment  having  undergone  the  same  number  of  divisions.  A 
system  of  structured  partial  differential  equations  is  derived  which  can  be  fit  directly  to  CFSE  histogram  data. 
From  such  a  model,  cell  counts  (in  terms  of  the  number  of  divisions  undergone)  can  be  directly  computed  and 
thus  key  biological  parameters  such  as  population  doubling  time  and  precursor  viability  can  be  determined. 
Mathematical  aspects  of  this  compartmental  model  are  discussed,  and  the  model  is  fit  to  a  data  set.  As 
in  [18,  19],  we  find  temporal  and  division  dependence  in  the  rates  of  proliferation  and  death  to  be  essential 
features  of  a  structured  population  model  for  CFSE  data.  Variability  in  cellular  autofluorescence  is  found  to 
play  a  significant  role  in  the  data,  as  well.  Finally,  the  compartmental  model  is  compared  to  previous  work, 
and  statistical  aspects  of  the  experimental  data  are  discussed. 


Key  words:  Cell  proliferation,  cell  division  number,  CFSE,  label  structured  population  dynamics,  partial 
differential  equations,  inverse  problems. 
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1  Introduction 


The  human  immune  response  is  a  complex  process  in  which  the  behavior  of  individual  cells  in  the  lymphatic 
system  is  altered  by  a  multitude  of  intra-  and  extracellular  signals.  The  mathematical  analysis  of  lymphocyte 
activation  and  division  can  be  performed  on  a  wide  range  of  scales,  from  the  molecular  level  (antigen  presentation 
and  recognition)  to  the  population  level.  This  report  focuses  on  the  latter.  To  that  end,  one  can  look  at  the 
total  number  of  divisions  a  cell  has  undergone  since  activation  and  how  cells  in  different  generations  differ  in 
phenotype. 

The  development  by  Lyons  and  Parish  [54]  of  the  intracellular  dye  carboxyfluorescein  succinimidyl  ester 
(CFSE)  for  use  in  proliferation  assays  has  resulted  in  an  essential  experimental  tool  for  researchers  studying  these 
complex  processes.  CFSE  is  nonradioactive  and  provides  long-lasting,  bright,  and  relatively  uniform  labeling  of 
all  cells  in  a  population  without  adversely  affecting  the  internal  machinery  of  the  cells.  When  cells  divide,  the 
dye  is  partitioned  approximately  in  half.  Thus,  when  labeled  cells  are  stimulated  to  divide,  the  CFSE  content 
of  individual  cells  in  the  population  can  be  assessed  via  flow  cytometry  and  the  number  of  divisions  a  cell 
has  undergone  can  be  determined  by  comparing  the  measured  fluorescence  intensity  of  a  cell  to  the  measured 
fluorescence  intensity  of  an  undivided  cell  [55,  54,  60,  61,  72,  74].  When  individual  cell  fluorescence  intensity 
measurements  for  all  cells  in  a  given  population  are  binned  into  a  histogram,  each  generation  of  cells  appears 
as  a  “peak”  in  the  histogram  data.  The  data  set  used  in  this  report  is  the  same  as  that  from  [18,  68]  and  the 
experimental  protocol  is  discussed  at  length  there.  The  data  is  depicted  in  Figure  1. 

While  the  quantitative  modeling  of  CFSE  data  has  traditionally  focused  on  the  deconvolution  of  the  data 
into  numbers  of  cells  per  generation  [30,  31,  32,  37,  38],  recent  efforts  [18,  19,  51,  53]  have  used  a  structured 
population  model  in  order  to  fit  the  CFSE  histogram  data  directly.  Using  this  technique,  we  have  produced 
a  strong,  physically  and  biologically  motivated  model  which  is  quite  capable  of  replicating  the  observed  CFSE 
histogram  data  obtained  via  flow  cytometry.  The  most  recent  partial  differential  equation  (PDE)  model  is  a 
fragmentation  equation  which  relates  the  structured  population  density  n(t,  x )  to  the  rates  of  proliferation  a(t,  x ) 


Figure  1:  Data  set  for  a  CFSE-based  proliferation  assay.  Note  that  the  data  is  presented  in  the  logarithmic 
coordinate  2  =  log10(x),  in  units  log  UI.  As  cells  divide,  CFSE  is  diluted  and  the  initially  unimodal  population 
density  becomes  multimodal.  While  it  is  easy  to  distinguish  the  various  peaks  in  the  data,  the  overlap  between 
peaks  results  in  some  systematic  error  when  attempting  to  identify  a  region  of  the  horizontal  axis  with  a  specific 
division  number.  In  addition  to  this  overlap,  the  slow  drift  to  the  left  over  time  (as  a  result  of  intracellular 
turnover  of  CFSE)  further  weakens  the  correlation. 
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and  death  /?(£,  x)  under  the  assumption  of  Gompertz  decay  of  label  and  is  given  by 


dn(t,  x )  _kt  d[(x  -  xa)n(t,  a;)] 

dt  06  dx 


(■ a(t ,  x)  +  /3(t,  x))n(t,  x)  +  X[xa,x*}^ a(t,  2x  -  xa)n(t,  2x  -  xa).  (1) 


The  structure  variable  x  is  the  fluorescence  intensity  (in  arbitrary  units  of  intensity,  UI)  of  the  cells.  Because  this 
fluorescence  intensity  arises  primarily  from  CFSE  within  the  cell,  we  refer  to  this  as  a  label  structured  population 
model  (as  opposed  to  age  or  size  structure,  etc.  [57]).  It  is  known  that  cells  lose  FI  in  time  even  in  the  absence 
of  division  as  a  result  of  the  natural  decay  of  CFSE  and  the  turnover  of  intracellular  proteins  to  which  the 
fluorescent  conjugates  bind.  The  advection  term  in  the  equation  above  accounts  for  this  phenomenon  using  a 
Gompertz  [39]  decay  velocity  v(t,x)  =  —c(x  —  xa)e~kt  with  characteristic  parameters  c  and  k,  which  has  been 
shown  [18]  to  accurately  describe  the  biphasic  decay  [56,  60,  72]  of  CFSE  FI  observed  in  data  sets.  The  parameter 
xa  represents  the  natural  autofluorescence  intensity  of  cells  in  the  absence  of  CFSE,  assumed  for  the  moment  in 
(1)  to  be  constant  across  the  cell  population. 

The  goal  of  such  a  mathematical  model  is  to  provide  biologists  with  simple  yet  intuitive  and  meaningful 
parameters  with  which  a  population  of  dividing  cells  can  be  described.  In  particular,  information  such  as  average 
rates  of  division  and  cell  viability  are  essential  to  the  analysis  of  the  effects  of  changing  experimental  conditions 
(e.g.,  differences  in  donors,  differences  between  diseased  and  healthy  cells)  on  proliferative  behavior.  The  motiva¬ 
tion  for  the  use  of  FI  as  a  structure  variable  is  that  the  serial  dilution  of  CFSE  by  cell  division  creates  a  correlation 
between  measured  FI  and  the  number  of  divisions  a  cell  has  undergone.  Thus  the  proliferation  and  death  rate 
functions  a(t,  x)  and  /?(£,  x),  which  are  estimated  in  terms  of  the  structure  variable  x  as  well  as  time,  can  be  used 
to  compute  average  division  rates  in  terms  of  the  number  of  divisions  undergone  [18].  (For  instance,  if  x  >  1000 
UI  corresponds  to  undivided  cells,  then  the  average  proliferation  rate,  as  a  function  of  time,  for  undivided  cells  is 
the  average  value  of  a(t,  x)  in  the  region  x  >  1000.) 

This  motivating  assumption  is  accurate  to  a  degree,  as  one  can  clearly  discern  the  distinct  generations  of  cells 
in  the  data  set  depicted  in  Figure  1.  However,  the  peaks  corresponding  to  particular  generations  of  cells  overlap 
slightly  and  drift  to  the  left  in  time  (as  a  result  of  CFSE  decay),  thus  weakening  the  correlation  between  the  state 
variable  and  division  number.  In  [7,  18],  it  is  shown  that  the  proliferation  and  death  rates  can  be  parameterized 
with  respect  to  a  ‘translated  variable’  which  accounts  for  the  loss  of  measured  FI  in  time,  and  that  this  translated 
variable  is  more  strongly  correlated  with  division  number  than  the  original  structure  variable  x.  Yet,  the  overlap 
between  distinct  peaks  in  the  data  remains  problematic,  and  it  is  not  clear  how  much  error  may  be  introduced 
into  the  estimated  proliferation  and  death  rates  by  this  overlap  of  distinct  generations. 

Furthermore,  while  the  model  (1)  is  advantageous  in  being  able  to  estimate  average  proliferation  and  death 
rates  without  any  deconvolution  of  the  data  into  cell  numbers,  it  cannot  be  used  to  accurately  assess  the  number 
of  cells  in  a  particular  generation.  This  information  could  be  approximated  by  integrating  the  structured  density 
n(t,  x)  over  a  region  [xi ,  x-2]  (corresponding  approximately  to  the  location  of  a  given  peak  in  the  histogram  data), 
but  this  approximation  is  limited  by  the  extent  to  which  distinct  generations  of  cells  in  the  histogram  data  overlap. 
Traditional  deconvolution  techniques  (such  as  fitting  peaks  with  normal  or  lognormal  curves)  impose  particular 
forms  on  the  experimental  data  which  may  bias  the  computed  number  of  cells  in  each  generation. 

While  all  these  efforts  to  date  correspond  to  several  iterations  in  an  iterative  modeling  process  (for  a  philosoph¬ 
ical  discussion  see  [21,  Chapter  1])  to  attempt  to  understand  cell  proliferation  using  CFSE  labeling  of  populations, 
we  clearly  have  not  yet  reached  a  satisfactory  understanding  of  the  complex  phenomena  involved.  The  fragmen¬ 
tation  models  used  with  the  CFSE  data  can  be  considered  as  what  have  been  termed  Aggregate  Data/Aggregate 
Dynamics  or  Type  II  inverse  problems  as  presented  in  [1,  Chapter  14]  and  [5],  Such  problems  are  also  common  in 
investigations  with  models  for  electromagnetic  propagation  in  inhomogeneous  dielectric  materials  including  bio¬ 
tissue  [13,  14],  vibrational  dissipation  in  viscoelastic  materials  [16],  and  HIV  cellular  progression  models  [1,  4,  5], 
To  better  understand  rates  at  the  generation  number  cohort  or  division  number  cohort  level,  one  should  at¬ 
tempt  to  develop  individual  (cohort)  dynamics  to  investigate  the  CFSE  data  in  a  Type  I  framework  of  Aggregate 
Data/ Individual  (Cohort)  Dynamics  inverse  problems  such  as  those  discussed  in  [1,  Chapter  14]  and  [5],  Similar 
approaches  have  been  successfully  pursued  in  marine  and  insect  population  models  [3,  6,  10,  12,  22]  as  well  as  in 
physiologically  based  pharmacokinetic  (PBPK)  models  in  toxicology  [5,  17].  Fortunately,  a  simple  reformulation 
of  (1)  allows  such  an  approach  and  permits  both  the  accurate  quantification  of  total  cells  per  division  number 
and  the  accurate  estimation  of  proliferation  and  death  rates  in  terms  of  division  number  in  such  a  framework. 
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Rather  than  modeling  the  population  with  a  single  differential  equation,  one  can  model  each  individual  generation 
of  cells  with  a  single  equation, 


dni  d\v(t,x)ni{t,x)\ 
~9t  +  dx 


+  f3i(t))ni(t,x)  +Ri(t,x), 


with  the  generations  linked  through  the  division  mechanism  Ri(t,x)  as  a  source  term  (see  the  next  section). 
This  is  a  common  technique  in  existing  ordinary  and  delay  differential  equations  models  for  dividing  cells  (see 
[26,  30,  31]).  Because  each  generation  of  cells  is  assigned  to  a  particular  compartment  (indexed  by  i)  with  unique 
proliferation  and  death  rates,  it  is  not  necessary  to  estimate  these  rates  in  terms  of  the  structure  variable  x,  so 
that  peak  overlap  and  label  decay  no  longer  affect  the  accuracy  of  the  estimated  rates.  This  is  in  contrast  to 
previous  work  [18,  19]  in  which  considerable  space  is  devoted  to  answering  the  question  of  how  to  parameterize 
the  structural  dependence  of  the  proliferation  and  death  rates.  As  an  added  advantage,  the  number  of  parameters 
necessary  for  the  parameterizations  of  the  proliferation  and  death  rates  is  reduced  (because  there  is  no  longer  a 
need  to  parameterize  the  functions  a,;  and  /3,  in  terms  of  the  structure  variable).  Furthermore,  the  existence  of 
multiple  compartments  makes  it  possible  to  accurately  determine  cell  numbers  in  terms  of  divisions  undergone, 
even  though  the  computed  densities  (for  the  distinct  compartments)  will  still  overlap  when  placed  simultaneously 
on  the  x  axis.  Because  this  model  does  not  rely  upon  any  assumptions  as  to  the  shape  (normal,  lognormal,  etc.) 
of  the  generation  peaks  (instead  starting  from  an  initial  condition  and  fitting  directly  to  the  CFSE  histogram 
data)  systematic  bias  should  be  avoided. 

In  this  presentaion  we  begin  with  a  careful  formulation  of  a  division- dependent  compartmental  model ,  hereafter 
called  simply  the  compartmental  model.  The  solution  to  this  model  is  then  presented  and  computational  aspects 
are  discussed.  Next  we  establish  an  inverse  problem  for  the  estimation  of  the  AutoFI  and  Gompertz  parameters, 
as  well  as  the  proliferation  and  death  rate  functions  a i(t)  and  f3i(t).  As  in  previous  work  [18,  19],  multiple 
parameterizations  of  the  proliferation  and  death  rate  functions  are  considered  with  the  goal  of  determining  how 
these  rates  depend  on  both  division  number  and  on  time.  After  presenting  results  which  demonstrate  the  enhanced 
capabilities  of  the  compartmental  model,  the  statistical  properties  of  the  flow  cytometry  data  are  considered  and 
ramifications  for  the  quantification  of  uncertainty  in  the  estimated  parameters  are  discussed. 


2  The  Compartmental  Model 

The  derivation  of  the  compartmental  model  follows  immediately  from  the  derivation  of  the  fragmentation  model 
(1)  in  [18],  which  is  itself  a  variation  of  the  structured  population  models  of  Bell- Anderson  [23]  and  Sinko- 
Streifer  [65].  A  complete  derivation  of  the  compartmental  model  can  be  found  in  Chapter  3  of  [68].  Let  m(t,x), 
0  <  i  <  tmax  be  the  label  structured  population  density  of  a  population  of  cells  stained  with  CFSE  and  having 
undergone  i  divisions.  The  structure  variable  x  is  the  fluorescence  intensity  (FI)  of  a  cell  (in  arbitrary  units  of 
intensity,  UI)  satisfying  x  >  xa  where  xa  is  the  natural  autofluorescence  intensity  (AutoFI)  of  cells;  t  is  time  (in 
hours).  While  it  is  known  that  AutoFI  increases  significantly  when  cells  become  activated,  this  increase  is  not 
believed  to  be  significant  for  the  current  modeling  effort  (as  AutoFI  contributes  minimally  to  the  measured  FI  of 
a  labeled  but  unactivated  cell).  Thus,  the  parameter  xa  should  be  understood  to  describe  AutoFI  for  activated 
cells.  It  is  known  that  FI  scales  linearly  with  the  concentration  of  CFSE  used  to  label  a  population  of  cells,  and 
that  this  measurement  does  not  change  significantly  when  cells  increase  in  size  [55].  Thus  we  assume  FI  is  a 
nrass-like  quantity. 

The  label-structured  density  of  a  population  of  dividing  cells  is  modeled  by  the  system  of  PDEs 
d n0  ,  d[v(t,x)n0(t,x)\ 

-Qf  H - - =  -  (o!o(t)  +  /3o(t))n0{t,  x) 

dn\  d\v(t,x)ni(t,x)]  ,  .  .  -  .  ..  ,  .  „  . 

~dt  4 - dx - =  ~  ^ ai  W  +  &  W)ni  (*> x )  +  Ri  (*> x) 


,  9[v(t,x)nimBJ[(t,x)\ 
dt  dx 


ftmax  (*Hma*  (M)  +  (*,  X) 


(2) 
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where  v(t,  x)  is  the  natural  rate  of  CFSE  FI  decay  (as  a  result  of  the  turnover  of  CFSE  within  individual  cells) 
and  Ri(t,x)  =  4a»_i(£)ni_i(t,  2x  —  xa)  for  1  <  i  <  imax  represents  the  influx  of  newly  divided  cells.  Note  the 
assumption  that  cqmax  =  0.  While  there  is  no  mathematical  limit  to  the  number  of  generations  which  can  be 
computed,  experimental  data  generally  exhibits  fewer  than  10  divisions.  In  an  inverse  problem  setting  (see  Section 
3),  the  parameter  imax  can  be  easily  fixed  in  advance  by  simply  counting  the  number  of  generations  which  appear 
in  the  data.  Because  it  is  then  known  that  there  are  no  cells  with  generation  number  imax  +  1,  there  must  be 
no  proliferation  in  generation  imax,  and  the  model  can  be  simplified  by  setting  a*max  =  0.  Of  course,  the  process 
of  determining  imax  could  be  automated  via  model  refinement  statistical  tests,  but  that  seems  unnecessary  given 
the  ease  with  which  the  parameter  can  be  identified  from  data. 

There  is  an  additional  mathematical  justification  for  setting  cqmax  =  0.  The  total  quantity  of  CFSE  FI  in  the 
population  is 


M{t) 


dx. 


Using  the  definition  of  M(t)  and  the  system  of  equations  (2),  we  can  show  that 


dM 

dt 


/»  OO  /  6 max  \  /*00  /  6 max  \ 

=  J  v(t,x)  y^2ni(t,x)J  -  J  x  \^2f3i(t)ni(t,x)J 

/OO  /  max  \  /*  OO 

W«i(C  x)  -  /  X  («imax(0«imax(C  x)) . 

a  \  j— 0  /  d  Xa 


While  the  first  three  terms  on  the  right  side  of  this  equation  are  physically  relevant  and  expected  (loss  of  FI 
by  Gompertz  decay,  loss  of  FI  by  death,  and  the  additive  role  of  AutoFI,  respectively)  the  final  term  is  not 
experimentally  valid  because  cells  do  not  recognize  a  maximum  division  number  after  which  they  must  leave  the 
measured  population.  The  requirement  that  cqmax  =  0  eliminates  this  term. 

The  initial  condition  must  be  prescribed  for  each  i, 


?ii(0,  x)  =  $i(x).  (3) 

It  will  generally  (but  not  necessarily  always)  be  true  that  <Fj(a;)  =  0  for  i  >  1  (that  is,  all  cells  in  the  population 
are  undivided  at  t  =  0).  These  initial  condition  curves  are  determined  from  data  taken  at  t  =  0  (see  Section  ??). 
The  left  (x  =  xa)  boundary  conditions  are  the  no-flux  boundary  conditions 


v(t,  xa)ni(t,xa)  =  0  (4) 

for  all  0  <  i  <  im ax-  Because  the  problem  is  defined  on  the  semi-infinite  domain  x  >  xa,  these  conditions  are 
sufficient  to  compute  a  solution.  This  is  in  contrast  to  previous  work  [18,  19,  51,  53]  in  which  a  zero-recruitment 
boundary  condition,  n(t,  a;max)  =  0,  is  imposed  at  the  right  boundary  of  the  computational  domain.  As  discussed 
in  the  next  section,  under  appropriate  conditions  these  two  formulations  are  equivalent. 

2.1  Model  Solution 

For  many  decay  velocities  v(t,x)  of  interest,  the  system  of  equations  (2)  can  be  solved  analytically  using  the 
method  of  characteristics.  As  discussed  previously,  we  assume  that  the  rate  at  which  cells  naturally  lose  FI  is 
described  by  the  same  function,  v(t,x),  for  all  cells  independent  of  division  number.  As  such,  the  characteristic 
lines  are  the  same  for  each  generation  of  cells.  Furthermore,  it  will  be  assumed  that  this  rate  of  FI  loss  is  adequately 
described  by  a  Gompertz  decay  process  [39];  this  has  been  shown  [18]  to  effectively  describe  the  biphasic  decay 
[56,  60,  72]  characteristic  of  proliferation  assay  data  when  the  intracellular  label  is  CFSE.  Thus  we  have 

v(t,  x)  =  —  c(x  —  xa)e~kt  (5) 

where  c  >  0  and  k  >  0  (both  with  units  1/hr)  are  parameters  to  be  estimated  using  the  data.  In  effect,  this 
function  describes  cellular  FI  which  decreases  exponentially  (with  initial  rate  c)  to  the  level  of  cellular  AutoFI, 
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while  the  exponential  rate  itself  decreases  (exponentially)  with  rate  k.  The  assumption  of  Gompertz  decay  of 
cellular  FI  has  the  additional  benefit  of  trivially  satisfying  the  left  boundary  condition  (4)  for  all  i,  provided 
rii(t,  xa)  is  finite  (so  that  the  flux  at  the  boundary  is  well-defined). 

Incorporating  the  Gompertz  decay  process,  we  can  rewrite  the  system  (2)  as 

-  ce~kt(x  -  xa)^-  =  ~  («o (t)  +  Po(t)  -  ce~kt)n0(t,x) 

-  ce~kt(x  -  Xa)^-  =  ~  (an (i)  +  Pi (t)  -  ce~kt)m(t,x)  +  Ri(t,x) 


dniiaax  -ku  \9n 

max  _  ^ 


dt 


dx 


=  ~  (ft™  (*)  ~  ce  (t,x)+  (t,x). 


The  characteristic  lines  (for  all  i)  are  described  by 

dx 


dt 


=  v(t,  x)  =  —c(x  —  xa)e 


—  kt 


and  hence  the  characteristic  line  emanating  from  the  point  (0,  s)  in  the  ta-plane  is 


(6) 

(7) 


x(t;  s ) 


Xa  +  ( S 


2Jo)exp 


(8) 


where  s  >  xa  parameterizes  the  line  along  which  the  initial  condition  is  prescribed. 

Define 

fi(t)  =  ai(t)  +  Pi(t)  -  ce~kt. 

For  undivided  cells  (*  =  0),  the  solution  along  a  characteristic  line  emanating  from  a  point  (0,  s)  in  the  ta-plane 
is  given  by 

=  -fo(t)n0(t,x(t;s)) 

with  no(0,  x(0;  s))  =  no(0,  s)  =  $0(5)-  Thus  the  solution  along  characteristic  lines  is 


n0{t,x(t-,s))  =  $0(s)exp 


(9) 


As  written  above,  the  system  of  equations  (6)  is  defined  on  the  semi- infinite  domain  x  >  xa.  In  general,  the  initial 
condition  function  $0(2:),  can  be  determined  from  data  (see  Section  3)  only  on  some  finite  segment  [ xa,xmeiX ]  of 
the  domain.  However,  there  is  no  loss  of  generality  in  extending  the  initial  condition  curve  by  assuming  $0  (x)  =  0 
if  x  >  a’max.  This  is  in  contrast  to  [18,  19,  51,  53]  in  which  a  PDE  was  defined  only  on  the  finite  interval  [xa,  xmax] 
and  a  zero-recruitment  boundary  was  imposed.  In  fact,  the  two  formulations  are  equivalent  provided  <b(a;max)  =  0 
(in  the  former  models;  <Fj(a:max)  =  0  for  all  i  in  the  compartmental  model)  and  v(t,x)  <  0.  As  the  semi-infinite 
formulation  is  notationally  simpler  and  easy  to  implement,  we  use  it  here. 

The  solutions  for  i  >  1  along  the  same  characteristic  lines  (8)  are  described  by 

dfi  ■ 

=  -  fi{t)rii{t ,  x(t;  s))  +  Ri(t,  x(t;  s))  (10) 

with  n.j(0,  x(0;  s))  =  $*(s)  and  the  solutions  are 


rii(t,  x(t;  s))  =  $j(s)exp  J  /,  (r)dr^  +  J  Ri(r,  ar(r;  s))exp  J  fi{^)d^j  dr.  (11) 

It  is  worth  noting  that  the  solution  by  the  method  of  characteristics  involves  the  construction  of  an  integral 
surface  in  the  coordinates  t  and  s.  The  change  of  coordinates  from  t  and  x  to  t  and  s  has  Jacobian 


J  = 


dt  dt 
dt  ds 
dx  dx 
dt  ds 


eXp(_£(i_e-«)) 
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Figure  2:  Characteristic  lines  given  by  Equations  (7)-(8)  when  c  =  lx  10-2  and  k  =  2  x  10~2  (left)  and  in  the 
limiting  case  when  k  =  0  (right).  Notice  that  distinct  characteristic  lines  will  remain  separated  by  some  positive 
distance  for  all  time  in  the  former  case,  while  in  the  latter  case  the  lines  asymptotically  converge  to  x  =  xa.  The 
horizontal  broken  line  along  the  bottom  of  both  graphics  is  the  line  along  which  the  initial  condition  data  is  given. 
It  is  clear  that  this  initial  condition  curve  is  nowhere  tangent  to  a  characteristic  line,  hence  the  local  existence  of 
a  unique  solution  is  guaranteed. 


which  is  nonsingular  along  the  initial  condition  curve  {t  =  0).  Hence  we  are  guaranteed  (by  the  construction 
above)  that  a  unique  solution  exists  at  least  locally  near  the  initial  condition  curve.  Note  that  in  the  limit  as 
k  — >  0+,  the  Jacobian  is  J^o  =  e~ct,  which  becomes  singular  asymptotically  in  time  (reflecting  the  asymptotic 
convergence  of  the  characteristic  lines).  In  such  a  case,  one  might  observe  solutions  which  grow  without  bound. 
This  is  only  of  minimal  concern,  however,  as  the  total  label  loss  resulting  from  decay  is  small  over  the  duration 
of  a  typical  experiment.  Possible  characteristic  lines  (for  k  >  0  and  k  =  0)  are  shown  graphically  in  Figure  2. 

For  the  remainder  of  this  document,  it  will  be  assumed  that  all  cells  are  undivided  at  t  =  0,  so  that  fib  (a;)  =  0 
for  i  >  1.  This  condition  is  satisfied  by  essentially  all  experimental  data.  Thus,  the  only  nontrivial  initial  condition 
for  the  PDE  system  (6)  is  $o  (%)  ■  As  this  model  is  motivated  by  an  attempt  to  fit  and  explain  experimental  data, 
this  smooth  initial  condition  must  be  constructed  from  data  taken  at  the  beginning  of  the  experiment.  Our  process 
for  doing  so  is  described  in  detail  in  [20,  68];  essentially  a  smooth  curve  is  drawn  through  the  original  histogram 
data  which  is  taken  to  represent  the  ‘true’  cell  counts  in  the  absence  of  noise.  Given  the  initial  condition  function 
$0(20  as  computed  this  way  from  data,  it  then  remains  to  numerically  compute  the  solutions  (9)  and  (11)  for 
no(t,x)  and  n,i(t,x),  1  <  i  <  im ax,  respectively.  Complete  implementation  details  on  our  numerical  methods 
(along  with  a  summary  of  simulations  to  ensure  numerical  convergence)  based  on  characteristics  in  this  context 
are  given  [20,  68]. 


3  Inverse  Problem  Formulation 

We  now  consider  the  inverse  problem  of  calibrating  the  model  (6)  to  a  particular  data  set.  As  stated  previously, 
the  data  consist  of  ordered  pairs  ( zJk ,  nJk)  indicating  the  total  number  of  cells  nJk  counted  into  the  histogram  bins 
with  left  boundary  at  zk  (in  the  log  FI  coordinate)  at  time  tj .  The  notation  is  meant  to  emphasize  the  possibility 
that  the  histogram  bins  need  not  share  a  common  fixed  width,  nor  need  they  be  the  same  at  each  measurement 
time.  The  data  set  we  will  use  to  calibrate  the  compartmental  model  is  shown  in  Figure  3,  with  measurements 
taken  at  t  =  24,  48,  96,  and  120  hours. 

Let  rii(t,  x)  be  the  solution  of  the  compartmental  model  for  cells  having  undergone  i  divisions.  Then  the  total 
population  of  cells  is 

n(t,x)  =  rii(t,x). 
i= 0 

Because  this  model  solution  is  computed  in  the  linear  FI  coordinate  x  while  the  data  is  given  in  the  logarithmic 
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Figure  3:  CFSE  data  set  for  the  compartmental  model. 


FI  coordinate  z  =  log10(x),  we  define 

h(t,  z)  =  10“  ln(10)n(i,  x(z))  =  10z  ln(10)n(t,  10z).  (12) 

The  function  n(t,  z)  is  the  structured  population  density  in  terms  of  the  new  structure  variable  z.  The  factor 
10z  ln(10)  arises  from  the  chain  rule  in  the  integral  formulation  of  the  model  (see  Section  2)  and  is  needed  to 
conserve  the  quantity  of  label  after  the  change  of  variables.  Finally,  we  need  to  convert  this  structured  density 
into  cell  counts  for  comparison  with  the  data.  Thus  we  define 

rzl+i 

=  /  .  n(tj,z)dz, 

•'4 

which  is  the  observation  operator  for  the  compartmental  model.  In  practice,  because  the  transformed  model 
solution  h(t,  z)  is  computed  only  at  discrete  points  ( tj ,  zJk),  we  must  approximate  this  observation  operator, 


/[n](tj,4)  «  IA[n](£j,4) 


2 


(13) 


3.1  Ordinary  Least  Squares 

Given  an  initial  condition,  the  solution  n(t ,  x)  (and  hence,  n(t,  z))  is  completely  determined  by  the  parameters  xa 
(AutoFI),  c  and  k  (Gompertz  decay),  as  well  as  the  proliferation  rates  {«i(t)}  and  the  death  rates  {/3 i(i)}.  Let 
9  =  {a;a,  c,  k,  (aj(t)},  {Pi{t)}}  C  0,  where  0  is  some  set  of  admissible  values  for  9.  (While  it  will  be  necessary  to 
make  some  simplifying  assumptions  on  0  in  order  to  render  the  inverse  problem  computationally  tractable,  we 
postpone  that  discussion  for  the  moment  and  proceed  with  a  general  overview  of  the  inverse  problem  procedure.) 
Thus  we  may  write  the  model  as  n(t,X',9).  The  goal  of  the  inverse  problem  is  to  determine  some  value  of  the 
parameter  9  which  minimizes  the  distance  (in  an  appropriate  sense)  between  the  cell  counts  determined  by  the 
model  solution,  I[h](tj,  zj.),  and  the  histogram  data.  For  this  report,  we  choose  least  squares  as  the  method  of 
estimation.  Following  standard  inverse  problem  procedure  [21,  28,  29,  63],  we  define  the  random  variables 

Nk  =  J[«](L/>  4;  Oo)  +  £kj,  (14) 
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where  £kj  are  independent  random  variables  satisfying  E[£kj]  =  0  representing  measurement  error  and/or  ‘noise’ 
in  the  data.  The  parameter  Oq  is  the  ‘true’  parameter  (given  the  model)  which  is  assumed  to  exist  and  to  describe 
the  data.  The  data,  then,  represent  a  single  realization  of  these  random  variables, 

ni  =  I[n]{tj,zJk;d0)  +  ekj. 

The  assumption  that  the  data  are  generated  from  the  specified  model,  given  a  nominal  truth  parameter,  is  common 
in  inverse  problem  formulations  [8,  21].  While  Oq  is  generally  unknown,  we  can  define  the  estimator 

eWLS  =  argnunE  ^  ^9)  N’)2 ,  (15) 


which  minimizes  the  weighted  sum  (with  weights  wkj  )  of  squared  residuals  Rkj-  Because  the  Nk  are  random 
variables,  so  are  the  Rkj  and,  hence,  so  is  Owls-  Using  the  data,  we  may  obtain  the  estimate 


(WsK)  =  “gmin ■£  -L  =  argmm]T  —  (/[«](*„ 
k,j  J  k,j  J 


zi-,0) 


In  theory,  the  weights  wk^  should  be  chosen  to  reflect  the  variance  of  the  random  variables  Nk.  In  fact, 
the  accurate,  unbiased  estimation  of  standard  errors  as  well  as  confidence  intervals  around  parameter  estimates 
is  premised  upon  an  accurate  statistical  model  (hence  accurate  weights)  for  the  error  terms  £kj.  In  practice, 
however,  such  a  statistical  model  is  rarely  (if  ever)  known  a  priori,  and  some  additional  assumptions  must  be 
made.  For  this  report,  we  assume  a  constant  variance  (CV)  error  model,  Var(£kj )  =  a2  for  all  k  and  j.  In  this 
case,  wkj  =  1  for  all  k  and  j  and  (15)  becomes  an  ordinary  least  squares  (OLS)  problem, 

Ools  =  argmin  J(0\Nk)  =  E ^tj  =  arS“biE  zk’  9 )  “  NkY >  (16) 

k,j  k,j 


with  corresponding  estimate 

OoLs{n{)  =  argmin  J(0  \  n3k). 


The  function  J(6\ni)  is  the  OLS  cost  of  the  model,  given  the  data,  and  is  often  written  simply  as  J{0)-  The 
expanded  notation  is  meant  to  emphasize  the  dependence  of  the  estimate  on  the  particular  data  set  used  to  fit 
the  model. 

It  should  be  noted  that,  rather  than  consider  constant  variance  errors  in  an  OLS  framework,  one  could  alter¬ 
natively  consider  a  statistical  model  with  constant  coefficient  of  variation  (CCV),  Var(£kj)  =  CTg(/[h](tj ,  zk]  Qq))2 ■ 
Then  wkj  =  (I[n](tj,  z3k,  Ogls ))2  and  (15)  becomes  the  generalized  least  squares  (GLS)  problem  defined  implicitly 
by 


Ogls  =  argmin  E 

k,j 


n 


2 

kj 


{I[n\(tj,zJk;0GLs))2 


=  arg  mm  > 
see 

k,j 


(JMtj’Z^O)  -  Nj)2 
{imj^zlOGLs))2  ’ 


(17) 


with  corresponding  estimate  Ogls{ji D-  As  noted  above,  the  results  presented  in  Section  4  will  focus  on  parameter 
estimation  in  an  OLS  framework.  A  more  thorough  consideration  of  the  reliability  of  the  assumptions  for  the 
statistical  error  model  in  the  inverse  problem  is  postponed  until  the  Discussion.  For  the  moment,  we  focus  on  the 
applicability  of  the  compartmental  model  to  a  particular  data  set-that  is,  how  well  the  compartmental  model  fits 
the  data.  Of  course,  the  measure  of  fit  is  assessed  in  an  OLS  framework,  which  may  be  slightly  different  than  a  GLS 
or  more  general  WLS  framework.  The  misspecffication  of  the  error  model  is  known  to  result  in  biased  standard 
errors  (and  hence  inaccurate  confidence  intervals),  and  thus  no  such  work  is  carried  out  here.  (Related  efforts  on 
determination  of  the  precise  form  of  measurement  error,  and  hence  the  corresponding  statistical  error,  in  a  family 
of  data  sets  similar  to  the  one  used  here  is  being  pursued  and  will  be  reported  on  in  a  separate  manuscript.)  In 
spite  of  this  drawback,  a  slight  misspecification  of  the  exact  error  model  should  have  only  minimal  effect  on  the 
estimated  best-fit  parameters  (see,  e.g.,  the  computational  example  of  Section  3.4.2  of  [21]),  and  thus  we  proceed 
with  the  OLS  estimation  of  Oq. 
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3.2  Parameterizations  of  Proliferation  and  Death  Rates 

We  have  already  defined  the  parameter  9  =  {xa,  c,  k,  (ai(t)},  {Pi(t)}}  C  0  which  describes  a  given  model  solution. 
The  parameters  xa ,  c,  and  k  are  all  elements  of  M  (although  in  a  generalization  below,  we  consider  estimation  of  a 
probability  distribution  on  the  parameter  xa)  and  thus  pose  no  problem  for  the  estimation  procedure.  However,  the 
proliferation  and  death  rates  cii(t)  and  Ppt),  0  <  i  <  imax,  are  contained  in  some  (infinite-dimensional)  function 
space.  Mathematically,  the  solutions  (9)  and  (11)  require  only  cq(f),  Pi{t)  £  £2(0,  T)  in  order  for  the  solution  to  be 
well-defined.  Because  it  can  reasonably  be  assumed  that  these  functions  are  bounded,  this  condition  is  naturally 
met.  However,  as  currently  written,  (16)  contains  a  minimization  over  an  infinite-dimensional  space  0.  In  order 
to  make  the  estimation  problem  amenable  to  computation,  additional  assumptions  and/or  approximations  are 
necessary. 

The  primary  motivation  for  using  a  label-structured  PDE  model  to  analyze  histogram  data  from  CFSE- 
based  proliferation  assays  was  an  attempt  to  use  measured  FI  as  a  surrogate  for  division  number  and  hence  to 
investigate  how  the  proliferation  and  death  rates  for  a  population  of  cells  change  with  division  number.  In  earlier 
efforts  [18,  19,  51,  53],  this  was  accomplished  by  allowing  the  proliferation  and  death  rates  to  depend  explicitly 
on  the  state  variable  (a:  or  z).  For  the  compartmental  model  formulated  in  this  report,  the  number  of  divisions 
undergone  is  accounted  for  directly,  so  that  it  is  no  longer  necessary  to  have  the  a*  and  pi  dependent  upon  the 
structure  variable  (following  the  assumption  that  the  interference  of  CFSE  with  the  intracellular  machinery  is 
negligible).  Additionally,  it  was  found  in  [18,  19]  that  explicit  time-dependence  of  the  rate  of  cell  proliferation  is 
a  significant  feature  of  an  accurate  label  structured  PDE  model.  We  would  like  to  pursue  this  investigation  using 
the  new  compartmental  model.  Thus,  as  we  consider  possible  parameterizations  of  the  proliferation  and  death 
rate  functions,  we  do  so  with  an  eye  toward  determining  the  heterogeneity  of  the  rates  (that  is,  how  they  vary 
with  division  number),  as  well  as  the  possible  time-dependence  of  the  proliferation  rates. 

We  begin  with  the  death  rate  functions  p%{t).  It  has  long  been  observed  that  a  significant  proportion  of 
undivided  cells  die  in  the  first  few  days  in  culture,  and  that  this  cell  death  occurs  independent  of  cellular  activation 
[38].  Beyond  these  observations,  we  would  like  to  explore  how  the  death  rates  of  cells  depend  on  division  number. 
Specifically  we  consider  the  following  possible  parameterizations  for  the  death  rate  functions  f3i.it): 

B1  Pi(t)  =  0  for  all  i  and  for  all  t ; 

B2  (3i{t)  =  f3  for  all  i  and  for  all  t; 

B3  /30{t)  =  p0,  Pi(t)  =  0  for  i  >  1; 

B4  /30(t)  =  p0,  Pi{t)  =  P  for  i  >  1; 

B5  pi{t)  =  pi  for  each  i. 

The  possibility  B1  is  included  as  a  baseline  for  comparison,  as  a  means  of  concluding  the  necessity  of  a 
death  term  in  the  mathematical  model.  As  noted  above,  it  is  expected  that  a  model  which  lacks  a  mechanism 
to  describe  cell  death  will  predict  far  too  many  cells  in  the  population  when  compared  to  the  experimental 
observations  [32,  38].  Parametrization  B2  assumes  a  constant  death  rate  in  the  population  for  all  cells  regardless 
of  division  number.  Gett  and  Hodgkin  [38]  have  shown  that  parametrization  B3,  in  which  undivided  cells  die 
but  all  cells  which  proceed  through  the  first  division  will  remain  in  the  population  indefinitely,  can  be  accurately 
used  to  predict  the  number  of  cells  in  the  population  up  to  approximately  90  hours.  More  generally,  one  might 
consider  that  cells  which  have  divided  at  least  once  may  die,  but  at  a  rate  which  is  possibly  different  from  the 
rate  for  undivided  cells.  This  parametrization  B4  has  also  been  successfully  used  to  model  proliferation  assay 
data  [30,  31,  32].  Finally,  in  parametrization  B5,  we  consider  the  possibility  that  the  death  rate  is  completely 
heterogenous  with  respect  to  division  number  [30,  36,  43,  47]. 

While  the  model  is  derived  in  sufficiently  general  terms  to  include  time-dependent  death  rate  functions,  we 
do  not  consider  any  such  parameterizations  in  this  report.  It  is  certainly  possible  that,  for  particular  cell  lines 
and  under  particular  culture  conditions,  feedback  mechanisms  such  as  activation-induced  cell  death  may  in  fact 
be  time-dependent  [38].  For  a  (hypothetical)  population  of  cells  which  divides  almost  synchronously,  such  time- 
dependence  would  be  identical  to  division-number-dependence  (i.e.,  a  mechanism  which  does  not  appear  until, 
say,  90  hours  could  be  equivalently  modeled  as  a  mechanism  which  does  not  appear  until  3  divisions  have  been 
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completed).  Thus  it  seems  reasonable  to  conclude  that,  to  some  extent,  the  necessity  of  time-dependent  death 
rates  in  the  mathematical  model  will  depend  on  the  degree  of  synchronicity  observed  in  the  experimental  data. 
At  the  very  least,  past  experience  [18,  19]  as  well  as  the  results  presented  here  (Section  4)  seem  to  suggest  little 
need  for  such  time-dependence,  at  least  for  the  current  data  set. 

Unlike  for  the  death  rate  functions,  past  experience  [18,  19]  does  indicate  a  potential  need  for  explicit  time- 
dependence  of  the  proliferation  rate  functions  (The  fact  that  time  dependence  for  cell  death  rates  seems 

to  be  sufficiently  modeled  with  only  division  dependence  while  a  similar  result  does  not  hold  for  the  proliferation 
rates  may  be  explained  if  the  time-dependence  of  the  proliferation  rates  occurs  on  a  scale  faster  than  the  aver¬ 
age  time  a  cell  takes  between  subsequent  divisions.)  To  explore  possibilities  we  consider  the  following  possible 
parameterizations  for  the  proliferation  rate  functions: 

A1  ao{t)  =  cco;  cc,(t)  =  a  for  all  i ; 

A2  ai(t)  =  on  for  all  t; 

A3  a0(t)  =  a0X[t>r];  «»(*)  =  a  for  all  i; 

A4  a0(t)  =  aoX[t>t*]l  ai (t)  =  ai'i 

A5  piecewise  linear  functions  of  time  (see  below). 

Previous  authors  [32,  38,  41]  have  emphasized  a  special  importance  for  the  time  required  for  a  cell  to  complete 
its  first  division.  In  case  Al,  it  is  assumed  that  undivided  cells  divide  at  a  rate  which  may  be  different  than 
the  rate  for  divided  cells,  but  that  neither  of  these  two  rates  depends  on  time  [31].  Alternatively,  we  consider 
the  more  general  case  A2  where  each  generation  of  cells  divides  with  its  own  (time-independent)  rate  [48].  We 
also  consider  a  simple  time-dependent  mechanism  in  which  there  is  a  delay  before  cells  begin  to  divide.  A  quick 
glance  at  the  data  (Figure  1)  reveals  that  no  division  occurs  in  the  population  for  at  least  the  first  24  hours. 
Such  a  delay  can  be  easily  incorporated  into  the  model  with  a  step  function  at  some  specified  time  t*.  Previous 
models  [31]  have  found  such  a  transient  in  the  undivided  population  to  be  a  significant  feature  of  an  accurate 
mathematical  model.  The  proliferation  rates  for  subsequent  generations  may  A4  or  may  not  A3  vary  with  the 
number  of  divisions  undergone. 

Finally,  following  the  example  of  [18],  we  consider  using  piecewise  linear  splines  to  incorporate  time-dependence 
into  the  proliferation  rates.  Given  a  fixed  set  of  nodes  {ta}},  we  consider  rates  of  the  form 

&i(t)  = 

Q 


where  l\q\ta})  =  1  if  p  =  q  and  is  zero  if  p  ^  q.  In  Table  1  we  list  the  nodes  {ta}}  used  for  the  estimation  of 
the  proliferation  rate  functions.  These  particular  nodes  have  been  chosen  based  upon  careful  consideration  of  the 
data  in  Figure  1  as  well  as  past  experience. 

Independent  of  which  parameterizations  of  the  proliferation  and  death  rates  are  used,  it  should  be  noted 
that  the  current  model  formulation  features  proliferation  and  death  rates  which  are  essentially  Malthusian  in 
nature  (see  Section  2).  That  is,  the  rates  at  which  cells  in  a  particular  generation  divide  and  die  is  assumed 
to  be  proportional  to  the  total  number  of  cells  in  that  generation  (with  ‘constants’  of  proportionality  a,  (t) 
and  for  proliferation  and  death,  respectively).  Alternatively,  a  model  can  easily  be  derived  with  limiting 
proliferation  and  death  rates  (e.g.,  logistic  rates,  Gompertz  rates,  etc.).  Malthusian  rates  have  been  used  with 
some  success  in  previous  models  and  should  be  accurate  for  any  population  of  cells  which  divides  sufficiently 
rapidly.  Biologically  speaking,  a  cell  must  proceed  through  several  necessary  activities  (growth,  DNA  replication, 
microtubule  formation,  etc.)  between  any  two  divisions,  and  this  must  induce  some  minimum  cell  cycle  time. 
Tools  such  as  delay  differential  equations  or  stochastic  processes  have  been  used  to  mathematically  model  the 
cell  cycle  (see,  e.g.,  [30,  31,  34,  37,  41,  44,  45,  47,  59,  66,  75])  and  have  resulted  in  several  successful  models.  We 
find  the  current  model  with  its  Malthusian  rates  to  be  simple  and  intuitive  while  also  fully  capable  of  accurately 
fitting  the  data  (see  Section  4).  However,  it  is  imperative  that  the  parameters  estimated  when  fitting  the  model 
to  a  particular  data  set  be  interpreted  in  the  context  of  the  form  of  the  model  being  used. 
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Table  1:  Chosen  nodes  for  the  estimation  of  piecewise  linear  proliferation  rates.  Bold  font  indicates  a  node  for 
which  the  proliferation  rate  was  set  to  zero  rather  than  estimated.  For  each  generation,  the  proliferation  rate  is 
assumed  to  be  zero  outside  the  set  of  nodes  shown.  Thus  the  proliferation  rate  is  estimated  at  three  nodes  for 
each  division  number.  _ 


Generation  ( i ) 

{^} 

0 

24,48,60,72,96 

1 

48,60,84,108,120 

2 

48,60,84,108,120 

3 

60,72,96,120 

4 

60,72,96,120 

5 

60,72,96,120 

6 

60,72,96,120 

3.3  Probabilistically  Distributed  AutoFI 

The  derivation  and  solution  of  the  compartmental  model  have  been  given  so  far  under  the  assumption  that  the 
natural  brightness  of  cells  in  the  absence  of  any  CFSE  molecules,  i.e. ,  the  autofluorescence  intensity  or  AutoFI, 
can  be  modeled  with  sufficient  accuracy  by  a  single  scalar  parameter  xa.  However,  it  is  known  that  the  AutoFI 
of  a  single  cell  changes  as  the  cell  becomes  activated,  and  that  AutoFI  varies  from  cell  to  cell  in  the  population, 
even  among  activated  cells. 

The  AutoFI  of  cells  can  be  measured  directly  by  setting  aside  a  portion  of  cells  from  the  PBMC  culture 
which  are  not  labeled  with  CFSE  (but  which  receive  an  otherwise  identical  treatment).  The  results  of  such  a 
measurement  are  depicted  in  Figure  4  for  two  donors,  each  at  two  different  measurement  times.  These  data  sets 
were  taken  independently  of  the  data  set  shown  in  Figure  1,  which  is  used  to  calibrate  the  model.  Because  FI 
measurements  are  not  absolute-they  depend  on  the  calibration  and  gain  settings  of  the  flow  cytometer  at  the  time 
of  the  experiment-the  data  shown  in  Figure  4  are  intended  only  to  examine  the  shape  of  the  AutoFI  distribution 
in  the  population,  not  its  absolute  magnitude.  As  time  progresses,  the  distribution  of  AutoFI  in  the  data  from 
both  donors  increases  slightly  in  mean  and  is  increasingly  skewed  to  the  right.  These  features  are  also  found 
in  additional  data  sets  for  24  <  t  <  144  (results  unpublished)  and  appear  to  be  the  result  of  some  unmodeled 
biological  processes.  The  most  likely  explanation  is  the  known  increase  in  AutoFI  as  cells  become  activated  [55, 
Fig.  6].  After  a  sufficient  amount  of  time,  essentially  all  cells  in  the  culture  have  either  become  activated  or  have 
died. 

Following  the  discussion  at  the  beginning  of  Section  2,  we  may  consider  only  AutoFI  for  activated  cells. 
While  we  have  thus  far  assumed  that  this  AutoFI  can  be  sufficiently  modeled  with  a  single  parameter,  Figure 
4  suggests  that  we  might  need  to  consider  a  probability  distribution  on  a  range  of  values  for  the  parameter  xa- 
Let  n(t,  x\  Xa)  represent  the  structured  population  density  of  a  cohort  or  subpopulation  of  cells  all  of  which  share 
the  same  AutoFI  parameter  xa,  subject  to  (6).  Assume  further  that  this  parameter  xa  is  distributed  in  the  total 
population  of  cells  with  some  probability  distribution  P.  Then  it  follows  that  the  total  population  is  described 

by 

rj(t,x)  =  E[n(t,x-,xa)\P]  =  (  n(t,x;xa)d,P(xa).  (18) 

J  j>min 

It  is  now  clear  that  the  structured  density  r/(t,  x)  for  the  total  population  of  cells  will  depend  upon  the 
probability  measure  P.  Figure  4  depicts  the  experimental  AutoFI  data  for  each  donor  and  measurement  time 
fitted  (ordinary  least  squares)  with  a  scaled  lognormal  curve.  While  such  an  assumption  may  possibly  be  of 
limited  validity  early  in  the  experiment  (probably  as  a  result  of  the  activation  process,  as  discussed  above),  most 
cells  are  undivided  at  such  times  and  hence  the  contribution  of  AutoFI  to  the  total  FI  of  those  cells  is  minimal. 
Thus  we  may  assume  that  P  is  reasonably  well-described  by  a  lognormal  distribution.  Hence 
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Figure  4:  Experimentally  determined  AutoFI  distributions  with  OLS  best-fit  scaled  lognormal  curves.  PBMCs 
of  2  blood  donors  were  cultured  without  CFSE  staining  and  without  stimulation.  After  24  (left)  and  144  (right) 
hours  respectively,  cells  were  stained  for  CD4  surface  expression  and  analyzed  by  flow  cytometry.  Shown  are  the 
histograms  of  CD4  cell  counts  as  a  function  of  CFSE  FI.  We  see  that  a  lognormal  distribution  for  AutoFI  is  quite 
accurate  by  t  =  144  hours  (right).  Such  an  assumption  is  less  accurate  at  t  =  24  hours,  when  a  significant  portion 
of  cells  in  the  population  remain  unactivated. 
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where 


M  =log  {E[x„]) 


Var(xa)\ 
E[xa]2  ) 


a 


2 


log 


Var(xa)\ 
E[xaY  )  ' 


Under  such  a  parametric  assumption,  the  population  density  r](t,  x)  is  uniquely  described  by  the  two  parameters 
E[xa]  and  STD[xa ]  =  \JV ar(xa )  (in  addition  to  the  parameters  0  discussed  so  far  in  this  section). 

The  integral  in  Equation  (18)  can  be  easily  computed  via  the  midpoint  rule.  Let  {x™}  be  a  set  of  evenly 
spaced  points  with  spacing  Axa ■  Then 


M 

V(t,x)  «  ^  n(t,x;x™)p(x™)Axa 

m=  1 


(19) 


As  written,  Equation  (19)  requires  the  computation  of  M  forward  solutions  in  order  to  approximate  the  total 
population  density.  However,  this  computationally  intensive  approach  can  be  avoided  by  a  change  of  variables. 
Define  y  =  log10(cc  —  xa)  and  h{t,y )  =  10y  log(10)n(t,  x(y))  =  10y  log(10)n(t,  10y  +  xa).  Then  the  system  (6) 
becomes 


dho  ce~kt  dho 

dt  log  10  dy 

dhi  ce~kt  dfi\ 

dt  log  10  dy 


(a0{t)  +  0o(t)  -  ce  kt)h0(t,x) 

(ai(t)  +  (t)  -  ce~kt)n1(t,  x)  +  2a0(t)h0(t,  y  +  log10  2) 


(&,=»*(*)  -  ce  x)  +  2aimax_i(t)nimax_i(t,  y  +  log10  2).  (20) 

It  is  then  clearly  observed  that  the  parameter  xa  no  longer  appears  in  the  system  of  equations  for  the  compart- 
rnental  model  in  the  structure  variable  y,  while  only  the  new  initial  condition, 

h{y)  =  10w  log(10)$o(10y  +  Xa),  (21) 

will  now  depend  on  xa ■  However,  provided  the  initial  uptake  of  CFSE  in  the  experimental  procedure  results  in 
cells  with  measured  FI  significantly  greater  than  their  AutoFI  (which  is  always  the  case  for  useful  experimental 
data),  $o(a;)  =  0  unless  x  »  xa  (and  hence,  unless  10y  >>  xa).  As  such,  the  dependence  of  the  initial  condition 
on  the  parameter  xa  can  be  safely  ignored.  This  fact  is  demonstrated  with  an  example  in  Figure  5.  In  general, 
it  is  expected  that  CFSE-labeled  cells  are  approximately  100-1000  times  brighter  than  unlabeled  cells  (see,  e.g., 
[55,  61,  74];  as  mentioned  previously,  the  actual  measured  FI  values  depend  on  machine  calibration,  and  hence 
will  vary  from  experiment  to  experiment).  For  the  initial  condition  corresponding  to  our  particular  data  set  of 
interest,  it  is  reasonable  to  assume  E[xa\  ~  10.  In  Figure  5,  a  sample  lognormal  distribution  with  E[xa]  =  12 
and  STD[xa\  =  4  is  depicted  on  the  left.  (These  values  for  the  mean  and  standard  deviation  can  be  taken 
as  maximum,  worst-case  bounds.  It  is  expected  that  the  mean  value  of  xa  is  no  more  than  12,  with  standard 
deviation  less  than  4.)  We  can  assess  the  effect  of  the  parameter  xa  on  $0  (y)  by  computing  4>0  (y)  for  extreme 
values  of  xa  (that  is,  values  in  the  far- left  and  far-right  tails  of  the  density  function).  The  resulting  functions  (as 
well  as  a  third  function,  showing  (j/)  when  xa  =  E[xa])  are  shown  on  the  right  of  Figure  5. 

It  is  clear  from  Figure  5  that  the  initial  condition  function  (for  y  as  a  structure  variable)  changes  only  minimally 
for  any  reasonable  values  of  xa.  (Moreover,  the  original  initial  condition  <I,o(2;)  was  already  approximate,  having 
been  computed  from  experimental  data.)  Thus,  computationally,  when  computing  the  structured  population 
density  according  to  (18),  we  compute  only  a  single  initial  condition  from  Equation  (21)  using  xa  =  E[xa\.  The 
system  (20)  can  then  be  solved  to  obtain  h{t,  y)  (which  does  not  depend  on  xa  at  all).  Next,  for  each  value  of  xa 
in  (19),  one  can  compute 

__  n(t,y(x))  __  h(t,\ og10Qr  -  xa) 

n(-hX'Xa)  log(10)Or  -Xa)  log(10)(x  —  Xa)  ’ 


dhi 


~  —  kt 


dhi 


dt  log  10  dy 
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Figure  5:  Left:  A  hypothetical  lognormal  AutoFI  distribution  with  E[xa\  =  12  and  Var{xa )  =  4.  Right:  Initial 
conditions  (in  the  structure  variable  y)  computed  for  the  mean  value  of  xa  (solid  line)  as  well  as  two  for  two 
extreme  values  of  xa.  One  can  see  that  the  value  of  the  parameter  xa  has  very  little  effect  on  the  initial  condition 

$0  (y)- 


in  order  to  determine  the  population  structured  density  r](t,x).  It  should  be  noted  that,  while  the  change  of 
variables  from  x  to  y  eliminates  the  parameter  xa  from  the  system  of  PDEs,  and  we  have  shown  that  the  effect  of 
xa  on  the  initial  condition  $0(2/)  is  negligible,  it  is  not  true  that  the  parameter  xa  can  be  ignored  entirely.  The 
negligible  effect  of  xa  on  the  initial  condition  is  the  result  of  the  brightness  of  CFSE-labeled  cells  at  the  beginning 
of  the  experiment.  However,  as  time  progresses,  CFSE  intensity  is  lost  as  cells  divide  and  CFSE  degrades,  so 
that  AutoFI  constitutes  a  larger  percentage  of  the  measured  FI.  In  other  words,  while  it  is  reasonable  to  assume 
n(0,  x)  =  $o(x)  =  0  unless  x  »  xa,  this  assumption  does  not  hold  more  generally  for  n(t,  x)  ( t  >  0). 

Finally,  when  using  Equation  (19)  to  approximate  the  total  population  density,  one  must  make  certain  that 
the  parameter  M  is  sufficiently  large  to  provide  desired  accuracy.  In  Figure  6,  a  sample  density  is  computed  at 
t  =  120  hours  using  three  different  values  of  M .  Given  the  discussion,  above,  there  is  essentially  no  difference 
in  computational  time  as  M  changes.  While  the  solution  is  not  accurately  captured  for  M  =  10,  there  is  no 
measurable  difference  between  the  solutions  for  M  =  100  and  M  =  1000.  Henceforth,  if  it  is  assumed  that 
AutoFI  is  distributed  in  the  population  of  cells,  the  total  population  r/(t,x )  will  be  computed  via  Equation  (19) 
with  M  =  100. 

3.4  Remarks  on  the  Inverse  Problem 

At  this  point,  we  have  considered  numerous  different  parameterizations  for  the  proliferation  rate  functions  apt) 
(A1-A5),  and  the  death  rates  Pi  (B1-B5).  Each  of  these  parameterizations  results  in  a  distinct  set  of  parameters 
which  will  need  to  be  estimated  from  the  data.  We  also  have  the  additional  label  loss  parameters  c  and  k,  as 
well  as  the  AutoFI  parameter  which  can  be  considered  either  as  a  fixed  constant  xa  or  as  a  lognormal  probability 
distribution  with  mean  E[xa]  and  standard  deviation  STD[xa\. 

In  the  remainder  of  this  report,  we  will  refer  to  the  model  solution  simply  as  n(t,  x;  6)  where  6  C  K.p  is 
a  set  of  parameters  which  describes  the  model.  (This  includes  the  case  that  xa  is  described  by  a  probability 
measure,  where  r/(t,x)  was  used  in  the  previous  exposition.)  This  is  done  to  simplify  notation,  and  it  will  always 
be  clear  from  context  which  parametrization  is  being  used.  Obviously,  the  value  of  p  will  vary  depending  upon 
the  parametrization.  The  various  possibilities  are  summarized  in  Table  2. 

We  now  return  to  the  OLS  formulation  (16)  of  the  inverse  problem, 

Ools  =  argmin^T^  =  argmin^  {l[n](tj,  z^,  6)  -  N3k)\ 

k,j  k,j 

where  now  0  is  a  closed  bounded  subset  of  Using  the  data  {n3k}  as  realizations  of  the  random  variables  {Nk}, 
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Figure  6:  Effect  of  the  number  of  nodes  M  used  to  approximate  the  total  population  density  rj(t,  x)  in  Equation 
(19).  Using  M  =  100  seems  more  than  sufficient. 


we  would  like  to  compute  the  estimate 

0OLs({nJk})  =  arg min  J(9 \{nk})  =  argnnn^]  (/[«](*,-,  z3k\ 9)  -  nJk)2 .  (22) 

k,j 

However,  we  have  only  an  approximate  numerical  solution  with  which  to  compare  the  data.  Thus  we  actually 
compute  the  approximate  estimate 

0OLs(ht,  Nx,M-  {n{})  =  arg  min  JA(0\{nJk})  =  arg  nun  ^  (/A[h](t,-,  4;  9)  -  n{)2 ,  (23) 

k,j 

where  we  have  now  explicitly  emphasized  the  dependence  of  the  parameter  estimate  on  the  computational  accu¬ 
racy  of  the  numerical  solution.  The  continuous  dependence  of  the  model  solution  h(t ,  z;  9)  on  the  parameter  6 
(regardless  of  which  particular  parametrization  is  used)  follows  easily  from  the  method  of  characteristics  solution 
of  Section  2.1.  Numerical  convergence  with  respect  to  ht ,  Nx,  and  M  follow  directly  from  well-known  results 
regarding  the  trapezoidal  rule  for  quadrature,  linear  interpolation  of  a  smooth  function,  and  the  midpoint  rule  for 
quadrature,  respectively.  As  such,  it  can  be  shown  (see,  e.g.,  the  arguments  of  [15,  Ch.  3]  that  the  approximate 
estimates  0oLs{ht,  Nx,  M ;  {nk})  will  converge  to  some  9qLS  which  minimizes  (22)  as  NX,M  oo,  and  ht  —>  0. 
It  should  be  noted  that  the  possible  nonuniqueness  of  the  minimizer  9qLS  is  a  common  issue  in  inverse  problems. 
We  forgo  techniques  such  as  Tikhonov  regularization  in  this  report,  choosing  to  focus  instead  on  the  accuracy  of 
the  best  fit  models  n(t,  z;9ols)  in  fitting  a  particular  data  set,  regardless  of  uniqueness  (although  these  issues 
must  be  dealt  with  in  order  to  establish  standard  errors,  confidence  intervals,  etc.).  For  the  remainder  of  this 
report,  we  will  not  distinguish  between  Qols ,  or  ®OLs(ht,  Nx,  M;  {nk}).  It  should  also  be  noted  that  this 

best-fit  parameter,  which  is  itself  an  estimate  of  the  random  variable  9ols ,  will  be  data-realization  dependent. 
However,  for  a  good  model  and  a  sufficiently  large  data  set,  Ools  is  an  unbiased  estimator  of  9ols  [21,  29,  63]. 

The  optimization  (23)  has  been  implemented  in  MATLAB  using  the  fmincon  function,  which  is  a  variation 
of  the  BFGS-active  set  algorithm  for  bound-constrained  parameters.  The  parameter  constraints  are  summarized 
in  Table  3. 
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Table  2:  Summary  of  possible  parameterizations  for  the  compartmental  model,  with  the  set  8  £  Rp  of  parameters 
describing  the  model  in  each  case. 


Model 

Parameters 

V 

Model 

Parameters 

P 

A1B1 

9  =  {xa,  c,  k,  ao,  op 

5 

AlBldist 

8  =  {E[xa\,  STD[xa\,  c,  k ,  ao,  «} 

6 

A1B2 

9  =  {xa,c,  k,ao,a,  p} 

6 

AlB2dist 

9  =  {E[xa],  STD[xa],c,  k,a0,a,  p} 

7 

A1B3 

9  =  {xa,c,k,a0,a,/3o} 

6 

AlB3dist 

9  =  {E[xa],  STD[xa\,  c,  k,a0,a,  do} 

7 

A1B4 

9  =  {xa,c,  fc,ao,a,do,d} 

7 

AlB4dist 

9  =  {E[xa},  ST D[xa], c,  k,  ao,  a,  do,  d} 

8 

A1B5 

9  =  {xa,c,  k,ao,a,  {Pi}} 

12 

AlB5dist 

9  =  {E[xa\,  STD[xa],  c,  k,  a0,  a,  {Pi}} 

13 

A2B1 

9  =  {xa,C,  fc,{oi}} 

9 

A2Bldist 

9  =  {E[xa],STD[xa],c,k,{ai}} 

10 

A2B2 

9  =  {xa,  c,  k,  {<*i},  p} 

10 

A2B2dist 

9  =  {E{xa\,  STD[xa\,  c,  k,  {a.p,  p} 

11 

A2B3 

9  =  {xa,c,  fc,{a;},do} 

10 

A2B3dist 

8  =  { E[xa ],  STD[xa\,c,  k,  {ai},  do} 

11 

A2B4 

9  =  {xa,C, do, /?} 

11 

A2B4dist 

9  =  {E[xa],STD[xa],c,k,{ai},Po,P} 

12 

A2B5 

9  =  {xa,c,k,{ai},{/3i}} 

16 

A2B5dist 

9  =  {E[xa],  STD[xa],c,  k,  {ai},  {Pi}} 

17 

A3B1 

9  =  {xa,  c,  k,  ag  ,t*,  a} 

6 

A3Bldist 

9  =  {E[xa],  STD[xa],c,k,ao,t* ,  a} 

7 

A3B2 

9  =  {xa,  c,k,ao,t* ,  a,  /3} 

7 

A3B2dist 

9  =  { E[xa ],  STD[xa],  c,k,ao,t*  ,a,  P} 

8 

A3B3 

9  =  {xa,c,  fc,a0,t*,a,do} 

7 

A3B3dist 

9  =  {E[xa\,  STD[xa],  c,  k,  a0,t* ,  a,  Po} 

8 

A3B4 

9  =  {xa,c,  k,ao,t*,a,  do,d} 

8 

A3B4dist 

9  =  {E[xa],  STD[xa\,c,k,ao,t* ,  a,  do,  d} 

9 

A3B5 

9  =  {xa,c,k,ao,t*,a,{Pi}} 

13 

A3B5dist 

9  =  {E[xa\,  ST D[xa\,  C,  k,  ao,  t* ,  a,  {Pi}} 

14 

A4B1 

9  =  {xa,c,k,ao,t*,{a}} 

10 

A4Bldist 

8  =  {E[xa\,  STD[xa],c,k,a0,t*  ,{a}} 

11 

A4B2 

9  =  {xa,c,  k,ao,t*,{a},  [}} 

11 

A4B2dist 

9  =  {E[xa],STD[xa],c,  k,  ao,  t* ,  {a},  p} 

12 

A4B3 

9  =  {xa,c,  k,ao,t*,  {a},/3 0} 

11 

A4B3dist 

9  =  {E[xa],STD[xa],c,  k,ao,t*,{a},Po} 

12 

A4B4 

9  =  {xa,c,k,a0,t*,{a},p0,p} 

12 

A4B4dist 

8  =  {E[xa\,  STD[xa],c,k,a0,t*  ,{a},  p0,  p} 

13 

A4B5 

9  =  {xa,  c,  k,  ao,  t* ,  {a},  {Pi}} 

17 

A4B5dist 

8  =  { E[xa],STD[xa\,c ,  k,  a0,  t* ,  {a},  {pi}} 

18 

A5B1 

9  =  {xa,c,k,{a(p)}} 

21 

A5Bldist 

9  =  {E[xa],STD[ia],c,fc,{alp)}} 

22 

A5B2 

9  =  {xa,c,  k,  {a(p'1},  P} 

22 

A5B2dist 

9  =  {E[xa],STD[xa\,c,  k,  {a(p) },  d} 

23 

A5B3 

9  =  {xa,  c,  k,  {a ]p)},  do} 

22 

A5B3dist 

9  =  {E[xa],  STD[xa],c,k,{a(ip)}, p0} 

23 

A5B4 

8  =  {xa,c)k,{af'1},  p0,  p] 

23 

A5B4dist 

9  =  {E[xa],STD[xa],c,  k,  {a(p)},  do,  d} 

24 

A5B5 

9  =  {xa,c,  k ,  {a(p)},  {di}} 

28 

A5B5dist 

9  =  { E[xa],STD[xa],c ,  k,  {a(p)},  {Pi}} 

29 

Table  3:  Summary  of  bound-constraints  for  the  OLS  parameter  estimation  problem  (23).  The  parameters  {cxi}, 
{■ af  },  and  {pi}  must  be  positive.  The  feasibility  of  the  remaining  bounds  has  been  determined  computationally. 


Parameter 

Minimum 

Maximum 

1 

20 

E\xa \ 

5 

12 

STD[xa] 

0 

4 

c 

1  x  1CT4 

1  x  10"2 

k 

0 

1  x  10"3 

3. 

f-l 

0 

d 

0 

1 

{Pi} 

0 

1 

3.5  Information  Theoretic  Model  Selection 

Each  possible  parametrization  presented  thus  far  gives  rise  to  a  distinct  mathematical  model  which  can  be  fit  to 
a  data  set  in  the  prescribed  manner.  Based  upon  the  results  for  each  model,  we  would  like  to  determine  which 
parametrization  is  most  appropriate  and  use  those  results  to  draw  conclusions  regarding  division-linked  and/or 
longitudinal  changes  in  the  behavior  of  the  cell  culture.  In  order  to  do  this,  we  must  establish  some  formal 
mechanism  which  permits  the  objective  comparison  of  different  models. 

One  common  approach  is  hypothesis  testing  for  model  refinements  [8,  11].  However,  such  methods  are 
only  useful  for  pairwise  comparisons,  and  are  better  suited  for  comparison  against  an  experimental  control  [24]. 
Moreover,  such  methods  do  not  apply  unless  one  of  the  two  models  in  the  comparison  is  contained  within  the 
other  model  (for  instance,  our  parametrization  B4  contains  B3  as  a  special  case).  While  several  parameterizations 
discussed  in  this  document  are  indeed  contained  within  other  parameterizations,  this  is  not  universally  the  case 
(e.g.,  there  is  no  containment  relationship  between  B2  and  B3). 
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A  more  general  approach,  based  upon  the  premises  of  information  theory,  is  found  in  the  Akaike  Information 
Criterion  (AIC).  Briefly,  for  models  with  independent,  homoscedastic,  normally  distributed  errors,  it  can  be  shown 
that  (recall  that  p  is  the  number  of  parameters  estimated) 

AIC  =  mlog 

y  to 

where  to  is  the  total  number  of  data  points,  is  an  approximately  unbiased  estimate  of  the  “expected  relative 
Kullback-Leibler  distance”  (information  loss)  when  a  model  is  used  to  describe  a  data  set  [24].  Given  a  set  of  R 
models,  the  AIC  can  be  computed  for  each  model;  we  seek  the  model  which  results  in  the  smallest  AIC  value. 
It  should  be  emphasized  that  the  AIC  is  only  an  estimate  of  information  loss,  and  this  estimate  depends  on  the 
particular  data  set  being  used.  (When  comparing  different  models  with  the  AIC,  the  same  data  set  must  be 
used  to  fit  each  model.)  As  discussed  in  Section  3.1,  we  cannot  ascertain  a  priori  that  the  measurement  errors 
are  normally  distributed  with  constant  variance.  However,  the  use  of  an  OLS  framework  already  constitutes 
an  assumption  of  homoscedasticity.  The  assumption  of  normality  does  not  seem  to  be  a  significantly  greater 
burden,  and  we  proceed  with  the  AIC  in  spite  of  these  issues,  recognizing  that  the  use  of  the  AIC  will  be  only 
suggestive.  As  will  be  shown  in  Section  4,  there  is  a  very  clear  preference  among  the  models  when  ranked  by 
the  AIC.  As  such,  we  do  not  expect  our  results  to  change  significantly  for  a  different  error  model.  Similarly, 
the  derivation  of  the  AIC  assumes  that  any  model  which  is  fit  to  the  data  is  sufficiently  accurate  so  that  the 
assumption  E[NJk ]  =  n{tj,Xf,9oLS )  is  valid.  While  this  assumption  may  break  down  for  the  least  accurate  of  the 
models  tested  in  this  report  (see  Section  4),  it  is  a  standard  assumption  in  the  OLS  framework  (16),  provided  the 
estimate  9ols  is  sufficiently  close  to  90  for  each  particular  model. 

There  is  an  element  of  parsimony  in  the  AIC,  as  a  model  which  fits  the  data  poorly  (high  J(9ols ))  or  which 
contains  a  large  number  of  parameters  (high  p)  will  have  a  comparatively  larger  AIC.  Yet,  rather  than  using  the 
AIC  to  determine  a  single  ‘best’  model,  additional  theory  is  available.  If  AICmin  is  the  smallest  computed  AIC 
value,  then  we  can  define  the  AIC  differences 


Ar  =  AICr  -  AIC„ 


(25) 


for  1  <  r  <  R,  where  AICr  is  the  AIC  value  computed  when  model  r  is  fit  to  the  data.  Finally,  we  can  compute 
the  Akaike  weights 


wr 


exp(-^) 

Erexp(^H  ' 


(26) 


It  can  be  shown  (either  by  likelihood  ratio  tests  or  in  a  Bayesian  framework,  see  [24])  that  the  AIC  weight  wr 
can  be  interpreted  as  the  probability  that  model  r  is  the  best  model  to  describe  the  data  (given  the  set  of  R 
possible  models).  Thus,  after  each  model  from  Table  2  is  fit  to  a  data  set,  we  can  compute  the  Akaike  weights  for 
the  set  of  candidate  models  and  use  these  to  assess  the  necessity  of  various  mathematical  features  (e.g.,  division 
dependence  of  cell  death  rates)  in  describing  the  data.  A  complete  derivation  of  the  AIC  and  Akaike  weights,  as 
well  as  numerous  examples  and  exhaustive  references,  can  be  found  in  [24]. 


4  Results  and  Discussion 

The  model  calibration  results  for  each  possible  parametrization  of  the  compartmental  model  considered  in  this  re¬ 
port  are  summarized  in  Table  5  of  [20].  There  the  approximate  OLS  costs  Ja(9ols)  are  given  for  each  parametriza¬ 
tion,  as  well  as  the  computed  AIC  values  and  AIC  differences.  The  models  are  also  ranked  in  terms  of  their  relative 
information  theoretic  loss.  The  AIC  selected  model  is  parametrization  A5B5  with  lognormally  distributed  AutoFI 
(henceforth,  A5B5dist)  with  a  cost  Ja(Pols )  =  3.0535  x  1011.  This  parametrization  resulted  in  a  model  with 
not  only  the  smallest  AIC  value,  but  also  the  lowest  cost  (meaning  that  the  decrease  in  cost  more  than  offset 
the  additional  parameters).  The  optimal  solution  for  parametrization  A5B5dist  is  depicted  in  comparison  to  the 
data  in  Figure  7.  The  estimated  piecewise  linear  proliferation  rates  can  be  found  in  Figure  8,  and  the  estimated 
death  rates  are  summarized  in  Table  4.  For  the  AutoFI  distribution,  the  best-fit  lognormal  distribution  has  mean 
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Figure  7:  Best-fit  solution  /^[n](t,  z;  9qls)  for  parametrization  A5B5dist.  Total  cost  Ja{Sqls)  =  3.0535  x  1011. 


Table  4:  Estimated  death  rates  f3 \  in  terms  of  the  number  i  of  divisions  undergone. 


Divisions 

Death  Rate  (1/hr) 

0 

0.0165 

1 

0.0000 

2 

0.0000 

3 

0.0000 

4 

0.0012 

5 

0.0544 

6 

0.1572 

E[xa\  =  8.739  UI  and  STD[xa\  =  3.534  UI.  The  estimated  Gompertz  label  decay  parameters  are  c  =  5.641  x  10~3 
and  k  =  lx  10-9. 

As  can  clearly  be  seen  in  Figure  7,  the  compartmental  model  (with  suitable  parametrization)  is  capable 
of  accurately  describing  the  particular  data  set  used  for  model  calibration  in  this  report.  The  most  notable 
shortcoming  of  the  model  occurs  at  t  =  24  hours,  where  a  distinct  cohort  of  cells  with  high  CFSE  FI  can  be 
seen  in  the  data  and  is  not  modeled  accurately.  As  discussed  in  [18],  this  cohort  is  believed  to  be  either  cell 
duplets  or  some  other  anomalous  cell  types  which  were  not  properly  gated  out  of  the  measured  cell  data,  and 
such  cells  should  not  be  an  issue  in  future  data  sets.  It  also  appears  that  neither  of  the  two  generations  in  the 
model  solution  at  t  =  48  hours  contains  enough  cells  (when  compared  to  the  data  at  that  time).  This  may  also 
be  partly  explained  as  a  systematic  error  resulting  from  the  presence  of  cell  duplets  in  the  data.  It  is  also  possible 
that  small  errors  associated  with  the  manner  in  which  counted  beads  (see  [68,  Ch.  1])  are  used  to  determine  the 
total  population  size. 
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Figure  8:  OLS  best-fit  piecewise  linear  proliferation  rate  functions  for  each  division  number.  Red  circles  indicate 
nodes  which  were  estimated  in  the  inverse  problem. 
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One  of  the  primary  goals  of  considering  various  parameterizations  for  the  proliferation  and  death  rates  (Section 
3)  was  to  investigate  the  dependence  of  these  rates  on  division  number  and  on  time.  The  best-fit  parametrization 
A5B5dist  features  a  proliferation  rate  which  depends  both  on  time  and  division  number,  as  well  as  a  death  rate 
which  depends  on  division  number.  Additionally,  the  AutoFI  parameter  xa  is  lognormally  distributed,  which  was 
not  considered  in  previous  efforts  [18].  Given  the  overwhelming  weight  assigned  to  the  parametrization  A5B5dist 
in  the  information  theoretic  framework,  it  is  tempting  to  conclude  that  each  of  these  features  is  necessary  in 
accurately  modeling  the  data.  Because  the  data  set  contains  4289  points,  even  a  small  difference  in  OLS  cost 
(compare,  for  example,  parameterizations  A5B5dist  and  A5B4dist)  results  in  significantly  different  AIC  values. 
However,  the  AIC  (24)  is  derived  under  the  assumptions  of  independent,  homoscedastic,  normally  distributed 
errors.  If  these  assumptions  are  not  valid,  particularly  if  the  4289  points  are  not  independent,  then  the  magnitude 
of  the  AIC  differences  may  be  misleadingly  large. 

In  spite  of  these  potential  setbacks,  there  are  still  several  useful  conclusions  which  can  be  safely  drawn.  As 
expected,  the  worst  parameterizations  (in  terms  of  both  OLS  cost  and  AIC  rank)  are  those  which  do  not  permit 
cell  death  in  the  population  (Bl).  Parameterizations  which  feature  probabilistically  distributed  AutoFI  are  more 
accurate  than  parameterizations  which  use  a  constant  parameter  xa  to  describe  AutoFI.  Among  the  models  which 
use  a  constant  parameter  xa  to  describe  AutoFI,  the  most  parsimonious  model  (that  is,  the  AIC  selected  model) 
is  parametrization  A5B4.  (Parameterizations  A5B4  and  A5B5  differ  minimally  in  cost,  but  A5B4  has  fewer 
parameters.)  The  best-fit  solution  for  this  model  is  shown  in  comparison  to  the  data  in  Figure  9.  We  find  that  a 
model  which  fails  to  account  for  variability  in  AutoFI  in  the  population  of  cells  does  not  adequately  describe  the 
increasing  heterogeneity  of  the  population  of  cells  as  division  number  increases.  This  is  particularly  noteworthy 
for  cells  having  undergone  4  or  more  divisions,  where  AutoFI  constitutes  a  comparatively  larger  fraction  of 
the  measured  FI  of  the  cells.  Such  an  observation  has  important  experimental  ramifications  for  the  design  of 
intracellular  dyes.  While  it  has  long  been  known  that  a  population  of  cells  must  obtain  a  high  level  of  FI  (relative 
to  their  AutoFI)  during  the  initial  staining  process  in  order  for  the  experimenter  to  resolve  multiple  rounds  of 
division  in  the  population  [55,  61],  we  now  see  that  the  variability  of  AutoFI  in  the  population  of  cells  also  has  an 
effect  on  the  peak-to-peak  resolution  of  the  data.  While  AutoFI  is  a  property  of  the  cells  being  measured  (it  arises 
from  intracellular  molecules  which  emit  light  in  the  frequency  bands  used  to  detect  the  intracellular  dye),  focus 
may  possibly  be  directed  toward  the  design  of  dyes  with  spectral  properties  that  minimally  overlap  with  common 
intracellular  molecules. 


As  in  previous  work  [18,  19],  we  find  that  time  dependence  is  a  significant  feature  of  the  proliferation  rates, 
given  the  model  formulation  (6).  Significantly,  we  find  that  the  population  of  cells  cannot  be  accurately  modeled 
by  considering  only  a  delay  in  the  time  to  first  division.  For  instance,  the  calibrated  model  using  parametrization 
A4B5dist  (which  is  the  AIC  selected  model  among  those  which  does  not  feature  completely  time  dependent 
proliferation)  is  shown  in  Figure  10.  This  parametrization  does  not  permit  any  proliferation  until  t  >  t*,  thus 
enforcing  a  delay  before  any  division  occurs  in  the  population.  Even  with  this  feature,  subsequent  divisions  of  cells 
emerge  too  quickly  in  the  model  solution.  Thus  more  complex  time-dependence  (parametrization  A5)  appears 
to  be  necessary,  as  the  resulting  decrease  in  cost  outweighs  the  additional  parameters. 

The  compartmental  model  was  motivated  by  a  desire  to  compute  quantities  such  as  cell  numbers  from  the 
best-fit  model  solution.  As  noted  above,  previous  methods  for  obtaining  cell  numbers  relied  on  some  form  of 
deconvolution  of  the  histogram  data,  typically  via  fitting  by  a  series  of  normal  or  lognormal  curves.  While  the 
compartmental  model  is  more  mathematically  involved  and  requires  considerably  more  time  for  fitting  to  data  (a 
few  minutes  to  a  few  hours,  depending  upon  the  parametrization  used  and  the  accuracy  of  the  initial  parameter 
guess  for  the  BFGS  algorithm),  it  does  not  require  any  assumption  as  to  the  shape  of  the  distribution  of  cells 
within  a  single  generation.  Given  a  calibrated  model  solution  n(t,  x;  9ols )>  one  can  compute  the  total  number  of 
cells 

/OO 

ni(t,x;60Ls)dx  (27) 


for  each  generation.  It  may  also  be  of  experimental  interest  to  consider  the  number  of  precursors  in  the  population. 
Because  each  cell  division  results  in  the  formation  of  two  daughter  cells  from  a  single  mother  cell,  one  must 
renormalize  (by  a  factor  of  2)  the  total  number  of  cells  in  each  generation  in  order  to  accurately  analyze  the 
proportion  of  cells  proceeding  through  a  specified  number  of  divisions.  Precursors,  then,  are  cells  in  the  original 
population  (that  is,  at  t  =  0  hours)  which  eventually  give  rise  to  other  cells  with  higher  division  numbers  at  later 
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Figure  9:  Among  the  models  which  do  not  use  a  lognormal  distribution  to  describe  AutoFI,  the  AIC  selected 
model  is  parametrization  A5B4.  When  comparing  the  best-fit  solution  to  the  data,  it  is  clear  that  a  model  lacking 
an  AutoFI  distribution  will  result  in  peaks  which  are  too  distinct  when  compared  to  the  data. 


times.  The  number  of  precursors  is 


N  (t)  1  r°° 

=  =  rn(t,x;e0Ls)dx.  (28) 

Given  that  precursors  represent  numbers  of  cells  in  the  original  population,  it  follows  that  the  total  number  of 
precursors 

*max 

p(t)  =  E 
2=0 

cannot  increase  in  time  (but  may  decrease  as  a  result  of  cell  death).  Cell  numbers  and  precursor  numbers  have 
been  computed  from  the  best- fit  model  solution  (parametrization  A5B5dist)  and  are  shown  in  Figure  11  for 
undivided  cells  (N0(t),  Po(t))  and  divided  cells  -/Vj(t),  as  we^  as  t°tal  cells.  It  follows  that 

such  curves  could  easily  be  used  to  determine  such  parameters  as  approximate  doubling  time  for  the  population, 
or  the  fraction  of  cells  which  do  not  divide.  These  parameters  may  be  of  particular  importance  in  accounting 
for  changes  in  behavior  as  a  function  of  experimental  conditions  (e.g.,  strength  of  stimulation)  or  in  a  diagnostic 
setting. 
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Figure  10:  Best-fit  model  solution  with  parametrization  A4B5dist,  the  AIC  selected  model  among  the  subset  of 
models  which  does  not  feature  completely  time-dependent  proliferation.  While  this  parametrization  includes  a 
delay  before  the  first  division  is  reached,  this  is  still  insufficient  to  describe  the  data  as  cells  proceed  through 
subsequent  rounds  of  division  too  quickly.  The  discontinuity  in  the  model  solution  at  t  =  48  hours  is  a  result  of 
a  sudden  change  in  the  size  of  the  histogram  bins  on  which  the  data  is  specified. 


Figure  11:  Total  cell  counts  (left)  and  total  precursors  (right)  in  terms  of  undivided  cells,  divided  cells,  and  total 
cells.  The  values  are  computed  from  the  best-fit  model  solution  n(t,x;doLs)  with  parametrization  A5B5dist. 
Numerical  values  at  data  collection  times  are  summarized  in  Tables  5  and  6.  The  slight  increase  (less  than  0.3%) 
in  the  total  number  of  precursors  between  i  «  60  hours  and  t  ~  90  hours  is  within  the  range  of  numerical  error 
for  the  computed  solution. 
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5  Discussion 


In  this  report,  a  label  structured  system  of  PDEs  for  a  population  of  dividing  cells,  indexed  by  the  number 
of  divisions  undergone,  is  derived  and  fit  to  data.  Under  the  appropriate  assumptions  for  the  label  loss  rate, 
autofluorescence  parameter,  proliferation  rates,  and  death  rates,  such  a  model  can  accurately  fit  an  experimental 
data  set  (Figure  7).  Because  each  generation  of  cells  is  mathematically  described  by  a  separate  structured  density 
function,  the  proliferation  and  death  rates  can  be  estimated  directly  in  terms  of  division  number,  and  there  is 
no  need  for  any  parametrization  of  these  rates  in  the  structure  variable.  This  is  in  contrast  to  the  previous 
fragmentation  model  (1)  from  [18].  The  AlC-selected  best-fit  compartmental  model  contains  29  parameters  and 
results  in  a  best-fit  OLS  cost  of  Ja(Pols)  =  3.0535  x  1011  while  the  best-fit  fragmentation  model  contained 
73  parameters  and  resulted  in  a  cost  of  3.0901  x  1011.  Thus  the  compartmental  model  appears  quite  superior 
to  the  previous  fragmentation  model,  as  it  contains  fewer  parameters  and  has  a  lower  OLS  cost.  Additionally, 
the  compartmental  model  can  be  used  to  compute  cell  numbers  in  terms  of  the  number  of  divisions  undergone. 
Certainly  it  may  be  possible  to  decrease  the  number  of  parameters  in  the  fragmentation  model  by  changing  the 
placement  of  the  nodes  used  for  the  estimation  of  the  proliferation  and  death  rate  functions.  Yet  even  if  the 
total  number  of  parameters  could  be  decreased  significantly  without  increasing  the  OLS  cost  of  the  fragmentation 
model,  it  still  could  not  be  used  directly  to  compute  cell  numbers. 

It  is  interesting  to  note  that  using  the  compartmental  model  we  have  found  variability  in  AutoFI  to  be  an 
essential  feature  of  an  accurate  mathematical  model.  Yet  the  fragmentation  model  assumes  only  a  constant  value 
of  AutoFI  without  significant  sacrifice  in  accurately  fitting  the  data  (see  [18]).  The  explanation  for  this  unusual 
observation  is  the  manner  in  which  the  proliferation  rate  is  parameterized  as  a  function  of  the  structure  variable 
(or  the  ‘translated  variable’)  in  the  fragmentation  model.  The  large  number  of  nodes  used  for  the  structural 
dependence  of  that  proliferation  rate  (13  nodes)  allows  for  significant  variability  in  the  proliferation  rate,  even 
among  cells  which  are  sufficiently  close  in  the  structural  coordinate.  Because  the  Gompertz  function  for  label 
decay  assumes  that  the  rate  of  FI  loss  is  directly  proportional  to  the  quantity  of  FI,  a  group  of  cells  which  divides 
immediately  and  then  pauses  will  lose  less  label  than  a  group  of  cells  which  waits  for  some  time  and  then  divides. 
In  other  words,  the  variability  of  the  proliferation  rate  induces  a  variability  in  the  label  loss  rate.  As  a  consequence 
of  this  observation,  it  would  be  interesting  to  compare  the  effects  of  probabilistically  distributed  AutoFI  with  the 
effects  of  probabilistically  distributed  label  loss  rates  in  the  compartmental  model. 

The  major  advantage  of  the  compartmental  model  over  previous  efforts  is  the  ability  to  compute  cell  num¬ 
bers  directly  from  the  model  solution  (Figure  11).  Because  the  compartmental  model  can  be  used  to  estimate 
the  numbers  of  cells  (or  precursors)  having  undergone  a  specified  number  of  divisions,  biologically  meaningful 
parameters  can  be  assessed  directly  in  terms  of  division  number.  For  instance,  the  total  number  of  precursors 
in  the  population,  as  a  fraction  of  the  original  number,  provides  a  meaningful  estimation  of  cell  viability.  The 
total  number  of  cells  in  the  population  can  be  used  to  estimate  the  population  doubling  time.  As  more  complex 
experiments  are  conducted,  the  compartmental  model  could  be  easily  generalized  to  account  for  division-linked 
changes  (surface  marker  expression/differentiation,  genetic  mutations,  etc.).  Such  features  should  be  useful  when 
comparing  results  from  different  data  sets,  such  as  when  attempting  to  quantify  the  effects  of  a  given  chemical 
reagent,  or  distinguishing  between  diseased  and  healthy  cells. 

Of  course,  the  meaningful  comparison  of  parameter  estimates  between  multiple  data  sets  and  experimental 
conditions  relies  upon  quantification  of  the  levels  of  uncertainty  in  the  estimated  parameters.  This  quantification, 
typically  in  the  form  of  confidence  bounds,  is  premised  upon  the  accurate  specification  of  the  statistical  model 
(14)  which  links  the  model  to  the  data.  In  this  report,  the  model  was  fit  to  the  data  in  an  ordinary  least  squares 
sense,  with  the  tacit  assumption  that  the  error  random  variables  £kj  have  mean  zero  and  constant  variance. 
However,  this  assumption  is  not  an  accurate  description  of  the  data.  While  the  misspecification  of  the  statistical 
error  model  does  not  invalidate  the  ability  of  the  compartmental  model  to  (qualitatively)  fit  the  available  CFSE 
data  set,  it  does  impede  the  meaningful  quantification  of  uncertainty  in  the  parameter  estimates.  Some  initial 
discussions  and  results  on  an  appropriate  statistical  model  to  be  used  with  our  mathematical  model  and  inverse 
problem  formulation  (see  [21,  Chapter  3])  are  given  in  [20,  Section  5.1]  and  [68,  Chapter  4],  Work  is  ongoing 
to  establish  a  suitable  mathematical  form  for  the  statistical  model  accurately  linking  the  histogram  data  to  the 
model. 
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Table  5:  Total  numbers  of  cells  in  terms  of  division  number.  For  each  time  and  generation,  the  total  number 
of  cells  has  been  computed  from  the  OLS  best- fit  model  solution  (top),  from  a  deconvolution  of  the  data  using 
normal  curves  (middle)  and  from  a  deconvolution  of  the  data  using  lognormal  curves  (bottom).  While  the  numbers 
computed  from  normal  and  lognormal  curves  are  generally  close  together,  there  are  clear  differences  between  the 
values  computed  from  the  deconvolution  methods  and  those  obtained  with  the  compartmental  model.  The  most 
striking  example  occurs  for  t  =  120  hours  for  cells  having  undergone  6  divisions.  It  is  interesting  to  note  that  the 
division  peaks  in  the  histogram  data  are  not  well-resolved  for  such  cells,  making  the  accurate  determination  of 
cell  numbers  difficult. _ 
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5.1  Comparison  with  Deconvolution  Techniques 

Given  the  applicability  of  the  compartmental  model  (once  calibrated)  to  computing  the  numbers  of  cells  having 
undergone  a  specified  number  of  divisions,  a  relevant  comparison  can  be  drawn  between  the  results  of  a  label 
structured  PDE  model  and  the  commonly  used  deconvolution  techniques.  In  Table  5,  the  number  of  cells  in  each 
generation  is  computed  at  each  measurement  time.  For  each  time  and  generation,  the  top  number  is  computed 
from  Equation  (27).  The  middle  number  is  computed  by  first  fitting  the  function 

*max 

i= 0 

to  the  data  at  a  given  time,  where  K,  Hii  ai)  is  a  normal  density  function  with  mean  //,;  and  standard 

deviation  a, ,  scaled  by  a  factor  ,s, .  The  method  of  fitting  is  ordinary  least  squares.  Then  the  total  number  of 
cells  having  undergone  i  divisions  is  Y^k  tpifeki  si,  Mo  0»).  The  final  number  in  each  block  of  Table  5  is  computed 
in  an  analogous  manner,  but  with  lognormal  rather  than  normal  density  functions. 

Unsurprisingly,  the  two  deconvolution  techniques  (fitting  with  a  series  of  normal  or  lognormal  curves)  provide 
estimates  of  cell  numbers  which  are  fairly  consistent.  However,  these  estimates  occasionally  differ  from  estimates 
obtained  from  the  compartmental  model.  Of  particular  note  is  the  difference  for  cells  having  undergone  6  divisions 
at  t  =  120  hours.  It  should  be  noted  that  this  generation  of  cells  is  very  difficult  to  distinguish  in  this  particular 
histogram  data  set.  Such  poorly  resolved  generations  of  cells  can  be  quite  problematic  for  the  deconvolution 
techniques,  as  the  unique  estimation  of  parameters  (for  the  normal  or  lognormal  curves)  requires  that  distinct 
generations  of  cells  be  plainly  visible.  It  appears  to  be  a  major  advantage  of  the  compartmental  model  to  be 
able  to  fit  data  (and  hence  compute  cell  numbers)  even  when  the  histogram  data  features  generations  of  cells 
which  are  less  than  ideally  resolved.  Of  course,  it  is  not  possible  to  say  from  these  results  which  technique 
(if  either)  is  providing  the  correct  number  of  cells.  Yet,  because  the  compartmental  model  is  derived  from  a 
conservation  law,  and  this  conservation  law  must  hold  regardless  of  the  parameters  input  into  the  model,  cells 
cannot  enter  or  leave  the  population  except  as  permitted  by  the  form  of  the  model  and  the  given  parameters. 
Meanwhile,  the  deconvolution  techniques  do  not  arise  from  any  conservation  law,  and  the  computed  cell  numbers 
in  each  generation  may  increase  or  decrease  freely,  unrestrained  by  any  balance  law.  It  seems  then,  that  the 
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Table  6:  Total  precursors  in  terms  of  divisions  number.  For  each  time  and  generation,  the  total  number  of 
precursors  has  been  computed  from  the  OLS  best- fit  model  solution  (top),  from  a  deconvolution  of  the  data  using 
normal  curves  (middle)  and  from  a  deconvolution  of  the  data  using  lognormal  curves  (bottom).  As  in  Table  5  we 
find  general  agreement  between  the  values  computed  from  deconvolution  techniques,  which  are  slightly  different 
than  those  computed  from  the  compartmental  model.  Under  the  assumptions  of  the  experiment,  the  total  number 
of  precursors  should  not  increase. 
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compartmental  model  should  have  a  major  advantage  in  computing  cell  numbers,  owing  to  its  ‘memory’  of  the 
number  of  cells  determined  at  previous  time  points  (even  when  the  generations  of  cells  are  poorly  resolved  in  the 
data). 

This  is  particularly  noteworthy  in  Table  6,  where  the  number  of  precursors  for  each  generation  of  cells  is 
computed.  As  in  Table  5,  each  block  of  Table  6  contains  the  results  computed  from  the  compartmental  model, 
deconvolution  with  normal  curves,  and  deconvolution  with  lognormal  curves.  Observe  the  significant  increase 
(more  than  10%)  in  the  total  number  of  precursors  as  computed  by  deconvolution  between  t  =  48  and  t  =  96 
hours.  As  discussed  above,  the  total  number  of  precursors  cannot  increase  in  a  population  of  cells.  While  the 
number  of  precursors  computed  from  the  compartmental  model  also  increases,  it  does  so  by  a  very  small  amount 
(less  than  0.3%)  consistent  with  the  error  in  the  numerical  solver.  Of  course,  it  has  already  been  noted  that  some 
data  sets  do  in  fact  exhibit  increases  in  the  total  number  of  precursors-a  discrepancy  arising  from  the  inaccuracy 
of  the  assumption  that  each  well  plate  contains  an  identical  population  of  cells.  On  one  hand,  the  deconvolution 
techniques  would  seem  to  have  an  advantage,  as  they  are  not  constrained  by  any  conservation  law.  However, 
this  has  an  interesting  implication.  Because  the  deconvolution  techniques  do  not  link  the  population  estimates 
from  one  data  collection  time  to  the  next,  there  is  a  potential  bias  associated  with  such  methods  as  a  result  of 
sample-to-sample  variability  in  the  experimental  data.  It  should  be  noted  that  sample-to-sample  variability  is 
also  problematic  for  the  compartmental  model  solution.  If  the  samples  used  to  obtain  the  experimental  data  are 
not  sufficiently  similar,  the  conservation  law  (which  follows  from  the  assumption  that  each  sample  is  identical) 
used  to  derive  the  model  may  not  hold.  In  such  a  case,  the  compartmental  model  would  be  systematically  in 
error  (when  compared  to  the  data),  as  the  calibrated  model  itself  would  still  follow  the  assumed  conservation 
law.  Following  the  discussion  above,  we  believe  that  a  more  accurate  statistical  model,  which  will  necessarily 
include  a  careful  consideration  of  the  method  of  sampling/data  collection,  will  resolve  any  discrepancy  with  the 
compartmental  model.  Some  preliminary  work  on  this  subject  is  surveyed  in  [68,  Chapter  4]. 

5.2  Generalizations  of  the  Mathematical  Model 

Apart  from  issues  involving  the  statistical  model  relating  the  mathematical  model  to  the  data,  it  has  been  shown 
that  the  compartmental  model  accurately  reproduces  the  behavior  of  a  PHA-stimulated  population  of  CD4+  cells 
as  represented  in  histogram  data  from  a  flow  cytometry  assay.  This  model  accounts  for  the  natural  rate  of  CFSE 
FI  decay  resulting  from  turnover  of  the  intracellular  label  as  well  as  the  autofluorescence  of  cells  in  the  absence 
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of  any  fluorescent  labeling.  Simple  linear  models  are  used  to  describe  the  rates  of  cell  division  and  death. 

In  this  paper  only  a  single  CFSE  data  set  has  been  examined  and  used  to  estimate  the  parameters  of  the 
mathematical  model(s).  It  is  believed  that  the  compartmental  model  is  quite  general  and  should  apply  to  a 
wide  range  of  data  sets  from  various  experimental  setups.  Work  is  ongoing  to  analyze  additional  data  sets  to 
demonstrate  such  a  wide  applicability  of  the  model.  As  additional  data  sets  become  available,  several  additional 
features  may  need  to  be  considered  at  greater  length. 

It  is  hoped  that  the  compartmental  model  can  be  generalized  to  account  for  multiple  cell  types  both  in  vivo 
and  in  vitro.  While  the  cells  studied  in  this  report  were  cultured  in  a  saturating  quantity  of  the  stimulating  agent 
PHA,  cells  in  vivo  (or  even  cells  in  vitro  in  a  different  experimental  setup)  will  not  experience  such  a  strong, 
constant  stimulation.  As  such,  the  possibility  exists  that  some  cells  may  return  to  a  quiescent  state  during  the 
proliferation  assay.  It  is  known  that  the  autofluorescence  of  a  cell  changes  depending  upon  its  state  of  activation, 
and  thus  this  mechanism  may  need  to  be  included  in  subsequent  modeling  efforts.  (For  the  current  data  set,  the 
quiescent  cells  are  all  undivided,  and  AutoFI  is  negligible  for  those  cells.) 

Similar  to  the  efforts  in  [18,  19],  we  have  used  Malthusian  rates  for  both  proliferation  (with  time-dependent 
rates  cq(f))  and  death  (with  rates  /?* ) .  As  discussed  in  Section  3,  such  an  assumption  is  reasonable  provided 
the  turnover  of  cells  (resulting  either  from  division  or  death)  occurs  at  a  sufficiently  rapid  pace.  Given  the 
physiological  constraints  placed  on  rapidly  dividing  cells  (e.g.,  rates  of  growth  and  DNA  replication),  one  would 
expect  some  sort  of  minimum  cell  cycle  time.  It  is  unclear  if  the  necessity  of  time  dependence  in  the  Malthusian 
rates  cq  is  an  artifact  of  such  a  feature.  To  test  this  hypothesis,  several  generalizations  of  the  proliferation  and 
death  rate  terms  are  immediately  available. 

First,  one  might  consider  the  addition  of  a  second  structure  variable  (say,  volume)  which  could  be  used  to 
enforce  a  minimum  cell  cycle  time  by  requiring  that  cells  progress  from  some  size  V  to  2V  before  dividing,  at 
which  point  two  cells  of  size  V  are  produced.  However,  in  the  absence  of  additional  observations,  it  is  unclear 
what  parameters  (e.g.,  average  rate  of  growth,  or  the  parameter  V)  might  be  estimable  from  CFSE  histogram 
data.  Video  microscopy  measurements  by  Hawkins,  et  al.  [42]  indicate  that  average  cell  size  may  be  division 
dependent,  and  this  may  add  some  additional  complexity  to  the  inclusion  of  volume  structure.  Biologically,  it 
is  expected  that  apoptosis  occurs  only  at  particular  checkpoints  in  the  cell  cycle  (particularly  if  external  ‘kill 
signals’  are  absent)  so  that  a  generalization  to  volume  structure  (or  any  other  surrogate  for  cell  cycle  position 
or  physiological  age)  may  permit  a  more  accurate  description  of  cell  death.  Still,  it  is  unclear  what  information 
might  be  estimated  from  only  CFSE  histogram  data.  It  is  possible  that  the  forward  scatter  (FSC)  of  laser  light 
may  possibly  be  used  as  an  observable  surrogate  for  cell  size,  and  some  additional  work  will  be  necessary  to 
investigate  this  possibility. 

A  second  possibility  to  generalize  the  rates  of  proliferation  and  death  would  be  to  consider  rate-limiting  (e.g., 
logistic,  Gompertz)  models  for  proliferation  and  death.  Some  biological  mechanisms  have  been  proposed  which 
may  lead  to  density-dependent  rates  of  cell  death  [27],  and  a  Gompertz  model  for  cell  growth  has  been  used  to 
account  for  quiescence  in  the  context  of  a  size-structured  population  model  [40].  Of  course,  generalizations  to 
nonlinear  division  and  death  must  be  considered  in  the  context  of  the  improvement  they  provide  in  fitting  a  given 
model  to  CFSE  data  sets.  Given  the  accuracy  of  the  simple  linear  models  (albeit  with  time-dependent  rates  of 
proliferation),  such  generalizations  seem  unnecessary  at  the  moment. 

Given  that  the  compartmental  model  can  be  used  to  compute  numbers  of  cells  per  generation  directly,  some 
comparison  has  already  been  made  between  the  results  obtained  with  this  model  and  the  cell  numbers  computed 
from  deconvolution  techniques  (Tables  5  and  6).  It  remains  to  compare  the  parameter  estimates  and  model  fits 
obtained  with  the  compartmental  model  with  those  obtained  from  previous  models  (Smith-Martin,  cyton,  etc.). 
In  fact,  it  may  be  possible  to  incorporate  into  the  compartmental  model  the  mathematical  forms  used  to  describe 
proliferation  and  death  in  these  models.  Recall  that  the  method  of  characteristics  provides  a  solution  (equations 
(9)  and  (11))  of  the  form 

rii(t,x(t;  s))  =  ^(Division, Death)  (29) 

where  x(t\  s)  is  the  characteristic  line  emanating  from  the  point  (0,  s)  in  the  te-plane.  Clearly,  the  left  side  of 
Equation  (29)  is  independent  of  any  mathematical  formulation  of  cell  proliferation  and  death.  In  this  report, 
the  form  of  the  hypothetical  function  F  is  determined  from  the  PDE  formulation  of  the  compartmental  model 
(2)  and  the  accompanying  assumptions  regarding  the  Malthusian  rates  of  proliferation  and  death.  Alternative, 
one  could  consider  using  (29)  or  its  differential  form  (i.e.,  (10))  as  a  starting  point,  defining  the  right  side  of 
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the  equation  in  accordance  with  the  assumptions  of  the  Smith-Martin  or  cyton  models,  or  their  generalizations 
[34,  41,  47,  58,  73].  While  previous  authors  have  derived  these  models  specifically  in  terms  of  total  cell  numbers, 
(29)  could  be  related  back  to  previous  work  by  simple  integration.  The  primary  advantage  in  using  (29)  would 
be  in  the  direct  comparison  of  the  model  to  histogram  data,  rather  than  from  computed  cell  numbers.  Further 
study  could  reveal  the  extent  (if  any)  to  which  such  a  direct  comparison  improves  the  unique  identification  of 
parameters  in  previous  models,  although  this  will  first  rely  on  an  accurate  statistical  model. 

In  this  context,  it  is  clear  that  several  alternative  possibilities  exist  for  a  mathematical  description  of  prolif¬ 
eration  and  death  rates.  Thus  it  is  clear  that  the  interpretation  of  the  proliferation  and  death  parameters  must 
be  made  with  careful  regard  to  the  form  of  the  model.  Given  the  form  of  the  model  solution  (Equations  (9)  and 
(11))  for  the  compartmental  model,  it  is  plainly  observed  that  linear  changes  in  parameters  for  proliferation  and 
death  rates  cause  an  exponential  response  in  the  computed  solution  [32,  38].  As  such,  the  sensitivity  of  the  model 
to  these  parameters,  as  well  as  the  degree  to  which  their  estimation  is  unique,  must  be  carefully  considered  when 
interpreting  estimated  parameters.  The  uniqueness  of  the  estimated  functions  oti(t)  will  depend  on  how  the  nodes 
for  the  linear  splines  are  chosen  in  relation  to  the  times  at  which  data  is  taken.  In  some  models,  it  has  been 
shown  that  the  effects  of  a  linear  increase  of  cell  cycle  time  with  division  number  cannot  be  distinguished  from 
the  effects  of  a  linear  increase  in  the  death  rate  with  division  number  [48].  If  this  is  the  case,  then  the  biological 
interpretation  of  some  parameters  may  be  suspect. 

Ideally,  the  values  of  cti(t)  and  /3,  can  be  related  back  to  more  physical/experimental  parameters  such  as 
the  type  and  strength  of  stimulation,  which  may  in  turn  require  the  mathematical  modeling  of  certain  molecular 
pathways  within  individual  cells.  Recent  work  has  indicated  that  the  mechanisms  responsible  for  cell  proliferation 
and  death  may  be  mutually  dependent  upon  a  common  molecular  pathway  [33,  67].  As  more  data  becomes 
available,  we  hope  to  examine  how  the  estimated  parameters  change  under  various  experimental  conditions,  with 
an  eye  toward  additional  constitutive  relationships  linking  molecular  and/or  subcellular  functions  to  population 
dynamics  [25].  In  this  context,  it  seems  necessary  to  consider  the  extent  to  which  these  functions  and/or  pathways 
are  inherited.  Evidence  suggests  that  closely  related  cells  exhibit  strong  correlation  in  times  to  divide  and  some 
correlation  in  times  to  die,  and  that  this  correlation  tends  to  decrease  with  the  number  of  divisions  undergone 
[42].  Cells  with  a  common  precursor  may  also  share  a  common  division  destiny  [42],  which  can  be  altered  by 
stimulation  conditions  [70].  While  computed  cell  numbers  are  relatively  unaffected  provided  correlation  is  limited 
to  cells  having  undergone  the  same  number  of  divisions  [34,  42,  45],  correlation  between  subsequent  division  of 
cells  can  alter  the  dynamics  predicted  by  a  mathematical  model  [73].  For  large  populations,  this  effect  seems 
negligible,  but  may  play  an  important  role  in  vivo  where  only  a  small  number  of  responding  cells  can  trigger  an 
immune  response  [73].  Cyton  models  and  branching  process  models  have  been  formulated  to  account  for  various 
levels  of  correlation  [34,  45,  73],  and  these  models  may  be  incorporated  into  the  compartmental  model  framework 
as  described  above.  Alternatively,  it  may  be  possible  (given  any  reasonable,  identifiable  parameterizations  of  cell 
division  and  death)  to  place  probability  distributions  on  these  parameters  (e.g.,  on  the  functions  cti(t)  and  f3i(t)) 
[6,  9,  12]  in  the  manner  described  in  Section  3.3. 

5.3  Concluding  Remarks 

Our  compartmental  model  is  the  latest  in  a  series  of  structured  PDE  models  which  can  be  fit  directly  to  histogram 
representations  of  flow  cytometry  data.  Once  calibrated,  the  compartmental  model  can  be  used  to  quickly  and 
accurately  estimate  the  numbers  of  cells  having  undergone  a  certain  number  of  divisions.  This  information  can  be 
used  to  determine  biologically  relevant  parameters  which  will  help  to  meaningfully  compare  cells  from  different 
donors  and  experiments.  While  the  use  of  cell  numbers  per  generation  is  not  new,  the  direct  modeling  of  histogram 
data  reduces  any  need  for  deconvolution  techniques  which  may  introduce  unnecessary  bias  into  the  computed  cell 
numbers.  Moreover,  because  the  model  is  based  upon  conservation  principles,  it  should  be  possible  to  fit  histogram 
data  even  when  the  ‘peaks’  in  the  data  (representing  distinct  generations  of  cells)  are  not  well-resolved.  This  is  a 
significant  advantage  over  deconvolution  techniques.  The  actual  number  of  generations  which  can  be  accurately 
modeled  (that  is,  the  maximum  value  of  imax)  will  depend  upon  the  uniformity  of  the  initial  uptake  of  intracellular 
dye  as  well  as  the  magnitude  of  the  resulting  CFSE  FI  relative  to  cellular  AutoFI. 

We  are  actively  working  to  collect  additional  data  sets  with  which  to  demonstrate  the  widespread  applicabil¬ 
ity  of  this  model,  as  well  as  to  use  this  model  in  a  systematic  fashion  to  analyze  how  the  estimated  parameters 
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vary  under  changing  experimental  and  biological  conditions.  Most  immediately,  this  will  require  the  develop¬ 
ment  of  an  accurate  statistical  model  for  the  data.  The  generalization  of  the  model  to  multiple  cell  types  is 
immediate,  although  an  accurate  quantification  of  any  interaction  terms  will  require  some  careful  thought  and 
experimentation. 

As  more  information  becomes  available  regarding  the  complex  processes  involved  in  cell  proliferation,  we  are 
confident  that  the  model  discussed  here  provides  a  firm  physiological  foundation  upon  which  CFSE-based  assay 
data  can  be  understood.  We  strongly  believe  that  the  ideas  and  results  presented  here  will  form  an  important 
interpretive  framework  with  a  wide  array  of  applications  in  experimental  settings,  diagnostic  tests  [35],  and 
perhaps  in  a  more  integrated  model  of  cell  dynamics  [46,  50]. 
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